THE PERFORMANCE OF GPU-BASED volume-rendering algorithms is usu-ally bounded by the fragment processor. With larger data sets, higher sampling rate and image resolution, more fragments are processed to accurately represent the data. With a larger number of fragments, memory bandwidth and latency become increasingly critical, because multiple data values from the volume are read for each fragment. Because memory bandwidth and latency strongly depend on the employed memory access patterns, we have to find ways to access the volume data during rendering in an optimized way.