ABSTRACT

Analyzing the ASICs, GPU-based, and field programmable gate array (FPGA)-based hardware neural network accelerators, the research on how to perform neural network applications has become a hot issue in industry and academia. The related works have concentrated on the domains of optimizing memory access bandwidth, advanced storage devices, and programming models. This chapter focuses on accelerating methods of deep learning algorithms whose data structures are sparse or compressed, and optimized methods of load balance caused by sparsity. Accelerators implemented by specific neural network chips allocate convolutional layers which are computation intensive on accelerating engine, and the others, such as activation layers, pooling layers and full-connected layers on another module to achieve processing computing in parallel. The convolutional accelerating module convolution layer processor (CP) and recurrent neural network and long short term memory module full-connected layer recurrent neural network processor (FRP) using distributed memory to ensure the data requirements of processing elements and computing cores.