ABSTRACT

A CNN consists of a number of identical cells, which are arranged in a twodimensional structure and are only connected to neighboring cells, where each cell has input, current, and next states. Distant cells are influenced by the others through data propagation between neighboring cells. With the CNN approach, different applications, such as visual processing and optimization, are achieved using the same algorithm with a different set of parameters. Although the local connectivity of the cells is well suited for implementation on a GPU, there are two additional issues that we have to address: first, the computational model of GPU is based on four-channel data, but the CNN data is conventionally organized in a one-channel format; second, the data transfer rate between the GPU and main memory is much slower than the transfer rate between the CPU and main memory.