Hardware Design and Software Practices for Efficient Neural Network Inference

doi:10.1201/9781003162810-4

Chapter

Hardware Design and Software Practices for Efficient Neural Network Inference

ABSTRACT

In this chapter, we introduce our efforts in accelerating neural network inference. From the hardware design aspect, we introduce the instructions-set-architecture deep learning accelerator to support all kinds of DNN models with customized ISA and optimized software compiler. And from the algorithm aspect, we introduce several practices we have used: sensitivity-based pruning without hardware model, quantization, iterative pruning with hardware model, and neural architecture search.

Take-aways

Discusses hardware design: An instructions-set-architecture deep learning accelerator to support all kinds of DNN models with customized ISA and optimized software compile

Discusses software practices: Sensitivity-based pruning without hardware model, quantization, iterative pruning with hardware model, neural architecture search.