ABSTRACT

In this chapter, we introduce our efforts in accelerating neural network inference. From the hardware design aspect, we introduce the instructions-set-architecture deep learning accelerator to support all kinds of DNN models with customized ISA and optimized software compiler. And from the algorithm aspect, we introduce several practices we have used: sensitivity-based pruning without hardware model, quantization, iterative pruning with hardware model, and neural architecture search.

Take-aways

Discusses hardware design: An instructions-set-architecture deep learning accelerator to support all kinds of DNN models with customized ISA and optimized software compile

Discusses software practices: Sensitivity-based pruning without hardware model, quantization, iterative pruning with hardware model, neural architecture search.