ABSTRACT

This chapter introduces the Graphics Processing Unit (GPU) edge detection program – imedgeG.cu and discusses its performance. It relates this performance to the building blocks of the GPU, such as the GPU cores and streaming multiprocessors, which are the execution units that house a bunch of these GPU cores. The chapter focuses on the hardware at a high level to get an idea about how the cores and memory are organized and how the data flows inside the GPU. Shared Cache Memory is the only shared memory in each GPU and it is the Last Level Cache (LLC). The GPU inside the LLC is surprisingly small compared to the CPU cache memory. Host Interface is the controller inside the GPU that is responsible for interfacing to the Peripheral Component Interconnect express (PCIe) bus. This is what allows the GPU to shuttle data back and forth between the CPU and itself.