ABSTRACT

This chapter focuses on OpenCL, which is the most popular Graphics Processing Unit (GPU) programming language, excluding Compute-Unified Device Architecture (CUDA). It examines how OpenCL simplifies writing multiplatform parallel programs. OpenCL was released in 2009 by the Khronos Group as a framework for writing parallel programs on many different platforms. Unlike CUDA, which only runs on Nvidia GPUs, OpenCL code is capable of running on CPUs, GPUs, and other devices such as field programmable gate arrays (FPGAs) and digital signal processors (DSPs), as long as the device supports OpenCL. OpenCL supports many different devices, but it is up to the device manufacturer to implement the drivers that allow OpenCL to work on their devices. These different implementations are known as platforms. Unlike CUDA, which operates on either synchronous blocking calls, or can operate asynchronously using streams, OpenCL's execution is queue-based, where all commands are dispatched to a command queue, and execute once they reach the head of the queue.