ABSTRACT

Model compression is a potential strategy for reducing computing overhead and increasing processing speed in ML systems, which are generally constrained by limited computational resources. Research into how to perform machine learning tasks on mobile and edge devices has been a hot topic in recent years. In order to maximize training acceleration, model compression methods can be integrated into the design of specialized hardware. New-generation mobile phones and computational primitives might leverage FPGA chips that allow 8-bit or fewer float-point operations to implement the data quantization approach. It is also necessary to have the appropriate hardware to use neural network architecture and training algorithm optimization properly. This chapter inspects this problem from multiple aspects to demonstrate the relationship between real-world on-device learning requirements and hardware implementation. Researchers might use this conversation to develop learning frameworks. The introduction of model compression into specialized hardware design is thus still an open issue that requires more investigation.