ABSTRACT

Single-instruction multiple-thread (SIMT) architecture has been emerging as a promising approach to achieving high throughput computing with high energy efficiency. The most well-known SIMT processors are state-of-the-art graphics processor units. The innovations of SIMT architecture come from both the hardware architecture and software sides. This chapter utilizes a top-down approach to describe SIMT architecture. It presents the SIMT programming model. The chapter discusses how SIMT workloads are mapped to SIMT processors. It dives into the microarchitecture of SIMT processors. The SIMT programming model enables data parallelism to be expressed as task-level parallelism. It follows the single-program multiple-thread paradigm, meaning that all threads share the same program. An application developer writes the scalar code, which is often referred to as kernel functions. In typical SIMT architecture, there are multiple SIMT cores on a chip. Such an SIMT core is referred to as a streaming multiprocessor using the CUDA terminology and a compute unit using the OpenCL terminology.