An overview of the joint optimization method for neural network compression
With the increasing demand for real-time,privacy and security of AI applications,deploying high-perform-ance neural network on an edge computing platform has become a research hotspot.Since common edge computing platforms have limitations in storage,computing power,and power consumption,the edge deployment of deep neural networks is still a huge challenge.Currently,one method to overcome the challenges is to compress the existing neural network to adapt to the device deployment conditions.The commonly used model compression algorithms include prun-ing,quantization,and knowledge distillation.By taking advantage of complementary multiple methods,the combined compression can achieve better compression acceleration effect,which is becoming a hot spot in research.This paper first makes a brief overview of the commonly used model compression algorithms,and then summarizes three com-monly used joint compression algorithms:"knowledge distillation + pruning","knowledge distillation + quantification"and"pruning + quantification",focusing on the analysis and discussion of basic ideas and methods of joint compression.Finally,the future key development direction of the neural network compression joint optimization method is put forward.