ABSTRACT

In fact, the compute shader threading model is easily mapped onto an image domain by using the x and y global thread addresses to index the image’s pixels (the threading model itself is covered in detail in Chapter 5). This makes it simple to develop an implementation that can use the massively parallel processing capabilities of modern GPUs, and that can provide a fairly significant performance improvement over traditional CPU implementations. In addition, the compute shader provides HLSL atomic operations, group shared memory, device memory resources, and synchronization intrinsic functions to facilitate communication between threads. If these additional tools are used in an appropriate algorithm, even further performance increases can be achieved.