ABSTRACT

THE PERFORMANCE OF GPU-BASED volume-rendering algorithms is usually bounded by the fragment processor. With larger data sets, higher sampling rate and image resolution, more fragments are processed to accurately represent the data. With a larger number of fragments, memory bandwidth and latency become increasingly critical, because multiple data values from the volume are read for each fragment. Because memory bandwidth and latency strongly depend on the employed memory access patterns, we have to find ways to access the volume data during rendering in an optimized way.

In addition to memory access bottlenecks, complex fragment program computations with a lot of ALU instructions can also be a major bottleneck. Therefore, the number of fragments for which complex shaders are executed must be reduced. One way to achieve this is by performing “expensive” computations and accessing memory only selectively. Furthermore, methods like leaping over empty space, skipping occluded parts, and termination of rays that have accumulated sufficient opacity help to accomplish this goal.