REVIEW OF QUANTIZATION-BASED DEEP NEURAL NETWORK OPTIMIZATION RESEARCH
The escalating sizes of model parameters and demands of computational resources pose significant challenges for deploying these models on resource-constrained devices.To tackle this challenge,quantization has emerged as a prominent technique enabling the compression and acceleration of deep neural networks by reducing the bit-width of model parameters and intermediate feature maps.This article presents a comprehensive review of the workings of quantization-based optimization for deep neural networks.First,common quantization methods and their research progress are discussed,and their similarities,differences,as well as the advantages and disadvantages are analyzed.Subsequently,it delves into different quantization granularities such as hierarchical quantization,group quantization,and channel quantization.Lastly,it analyzes the relationship between training and quantization,discusses the outcomes and obstacles in current research,with the aim of laying a theoretical groundwork for future studies on deep neural network quantization.
deep neural networkmodel quantizationquantization-aware trainingpost-training quantization