基于量化的深度神经网络优化研究综述
REVIEW OF QUANTIZATION-BASED DEEP NEURAL NETWORK OPTIMIZATION RESEARCH
咸聪慧 1王天一 1李超 2吕越 2孙建德1
作者信息
- 1. 山东师范大学信息科学与工程学院,250358,济南
- 2. 山东鲁软数字科技有限公司,250001,济南
- 折叠
摘要
随着模型参数和计算资源需求的不断增长,将模型部署在资源有限的设备上成为一个巨大的挑战.为解决这一挑战,量化成为了一种主要的方法,通过减少深度神经网络模型参数和中间过程特征图的位宽,可以对深度神经网络进行压缩和加速.文章全面回顾了基于量化的深度神经网络优化的工作原理.首先,讨论了常见的量化方法及其研究进展,并分析了各种量化方法之间的相似性、差异性以及各自的优缺点.其次,进一步探讨了分层量化、分组量化和通道量化等不同的量化粒度.最后,分析了训练与量化之间的相互关系,并讨论了当前研究所取得的成果和面临的挑战,旨在为未来深度神经网络量化研究提供理论基础.
Abstract
The escalating sizes of model parameters and demands of computational resources pose significant challenges for deploying these models on resource-constrained devices.To tackle this challenge,quantization has emerged as a prominent technique enabling the compression and acceleration of deep neural networks by reducing the bit-width of model parameters and intermediate feature maps.This article presents a comprehensive review of the workings of quantization-based optimization for deep neural networks.First,common quantization methods and their research progress are discussed,and their similarities,differences,as well as the advantages and disadvantages are analyzed.Subsequently,it delves into different quantization granularities such as hierarchical quantization,group quantization,and channel quantization.Lastly,it analyzes the relationship between training and quantization,discusses the outcomes and obstacles in current research,with the aim of laying a theoretical groundwork for future studies on deep neural network quantization.
关键词
深度神经网络/模型量化/量化感知训练/离线量化Key words
deep neural network/model quantization/quantization-aware training/post-training quantization引用本文复制引用
出版年
2024