ABSTRACT

This chapter explains how this limitation is overcome by the copy engine hardware found in the NVIDIA Fermi architecture [NVIDIA 10] generation and later GPUs. A copy engine is a dedicated controller on the GPU that performs DMA transfers of data between CPU memory and GPU memory independent of the graphics engine (Figure 29.1). Each copy engine allows one-way-at-a-time bidirectional transfer. The NVIDIA Fermi GeForce and the low-end Quadro cards1 have one copy engine such

that unidirectional transfers can be concurrently performed with rendering, allowing for two-way overlap. The Quadro mid-higher level cards2 have two copy engines so that bidirectional transfers can be done in parallel with rendering. This three-way overlap means that the current set of data can be processed while the previous set is downloaded from the GPU and the next set is uploaded.