ABSTRACT

Image processing using a GPU often means using it as a general purpose computing processor, which soon brings up the issue of data transfers, especially when kernel runtime is fast and/or when large data sets are processed. The truth is that, in certain cases, data transfers between GPU and CPU are slower than the actual computation on GPU. It remains that global runtime can still be faster than similar processes run on CPU. Therefore, to fully optimize global runtimes, it is important to pay attention to how memory transfers are done. This leads us to propose, in the following section, an overall code structure to be used with all our kernel examples.