CUDA Streams

doi:10.1201/9781315368290-11

ABSTRACT

In a Compute-Unified Device Architecture program, first the data must be transferred from the CPU memory into the Graphics Processing Unit (GPU) memory; it is only when the data is in GPU memory that the GPU cores can access it and process it. When the kernel execution is done, the processed data must be transferred back into the CPU memory. A virtual address is only an "illusion" and cannot be used to access any data unless it is translated into a physical address; a physical address is an actual address that is used to access the data in dynamic random access memory main memory. Nvidia GPUs need two different types of engines to implement streaming: kernel execution engine, and copy engine. The purpose of the copy engine is to queue the incoming and outgoing operations and perform them when the Peripheral Component Interconnect express bus is available.