Parallelized QSplat

Abstract Download Usage Different Threads Performance Further Work References

Abstract

Rusinkiewicz and Levoy present a multiresolution point-based rendering system for large meshes in their paper, QSplat. Their system favors parallelization because multiple processors can process different subtrees of a model’s sphere hierarchy at the same time. In this project, the rendering algorithm of the author's original source code was modified to take advantage of multiprocessing using threads. Benchmarks run on a Dual Pentium 4 Xeon 2.8Ghz using software rendering resulted in an increase of the original performance of QSplat for larger models.


Download


Usage

  1. Download and uncompress .qs format models from the original QSplat homepage.
  2. Download and uncompress the modified QSplat executable for Win32.
  3. Open the modified QSplat executable.
  4. Render using software: Under the Driver menu, choose Software, and then Z-Buffer.
  5. Render Threaded: Under the Options menu, choose Render Threaded
  6. Under the File menu, choose Open, then browse for a downloaded .qs file in step 1.
  7. The model should be displayed.
  8. A file named "output.txt" should be made in the same folder. This file contains the rendering times of the model.

Running Different Number of Threads

Per request, here are a number of images with Parallelized QSplat running with different number of threads rendering only part of the whole Buddha model.

1 Thread

2 Threads

3 Threads


Performance

Test setup

Results

Model Threading Splats Rendered Render time (sec.) Splats per second Speed up
Buddha Non-Threaded 1,466,9441.1511,274,495 262%
Threaded 1,281,3840.3843,336,937
Bunny Non-Threaded 110,4880.384287,729 92%
Threaded 102,0890.384265,856
Dragon Non-Threaded 1,707,9811.3421,272,713 134%
Threaded 1,470,8710.8641,702,396
Lion Non-Threaded 384,7800.576668,020 109%
Threaded 3495400.480728,208
Lucy Non-Threaded 2,101,1641.5341,369,728 133%
Threaded 1,745,4650.9601,818,192


Detailed numerical results can be found here.

Discussion

Looking at the perfomance chart, models with more vertices (i.e. Buddha, Dragon, and Lucy) benefit more from the threading of the QSplat rendering algorithm. Models with less vertices (i.e. Bunny and Lion) do not benefit as much. Rendering the Bunny model using the threaded algorithm actually decreases its performance.

An explanation for this behavior would be that the threaded algorithm has extra overhead when setting up threads to run the rendering. There exists a threshold of vertex points that should be met before an increase in performance is apparent.


Further Work

Currently, the naive implementation of Parallelized QSplat looks at the number of children of the root node of a model's sphere hierarchy and kicks off a thread for each child. This causes a lot of overheaded when using level-of-detail rendering because threads are made each time a model is refined. A better implementation of the parallelization (maybe using a static amount of threads or some other parallelization mechanism) would further increase performance.

Another optimization would consist of finding the number of points such that rendering a smaller model would not cause a decrease in performance. Once found, the rendering algorithm can be changed so that if the number of vertices in a model is less than the found threshold, then a non-threaded branch of the program would render the model.


References


Kevin Le
kle@calpoly.edu
Last updated 3/14/2005