基于舍入误差的神经网络量化方法
Neural network quantization method based on round-error
郭秋丹 1濮约刚 1张启军 1丁传红 1吴栋1
作者信息
- 1. 中国航天科工集团第二研究院七○六所,北京 100854
- 折叠
摘要
深度神经网络需要付出高昂的计算成本,降低神经网络推理的功耗和延迟,是将神经网络集成到对功耗和计算严格要求的边缘设备上的关键所在.针对这一点,提出一种采用舍入误差的端到端神经网络训练后量化方法,缓解神经网络量化到低比特宽时带来的精度下降问题.该方法只需采用小批量且无标注的数据进行训练,且在不同的神经网络结构上都有十分不错的表现,RegNetX-3.2GF在权重和激活数的比特宽均为4的情况下分类准确率下降不到2%.
Abstract
Deep neural networks often involve high computational costs.Reducing the power consumption and the latency of neu-ral network inference is key to integrating neural networks into edge devices with stringent power and computational require-ments.To address this,an end-to-end post-training neural network quantization method was proposed using rounding error to mitigate the accuracy degradation associated with neural network quantization to low bit widths.The method required only small and unlabeled data for training and performed very well on different neural network architectures.RegNetX-3.2GF has less than 2%degradation in classification accuracy with a bit width of 4 for both weights and activations.
关键词
模型压缩/网络蒸馏/网络量化/目标识别/感知训练量化/训练后量化/舍入误差Key words
model compression/network distillation/network quantization/target recognition/quantization-aware training/post-training quantization/round-error引用本文复制引用
出版年
2024