ABSTRACT

This chapter focuses on the memory architecture of different Nvidia Graphics Processing Unit (GPU) families and improve programmer's kernels so their access to the data in different memory regions is efficient, i.e., programmers will make them memory friendly. It explains how to use a very important tool named Compute-Unified Device Architecture Occupancy Calculator, which allows programmers to establish a formal methodology for determining kernel launch parameters to ensure optimum resource utilization inside the streaming multiprocessors (SMs) of the GPU. The most important type of memory in a GPU is the shared memory. Shared memory is an SM-level resource. Each SM has its own shared memory and every block requests a certain amount of it. The instruction cache and instruction buffer also reside in the SMs of the GPU. They cache the machine code instructions that are needed to run the kernels inside an SM.