计算机系统应用2024,Vol.33Issue(12) :161-169.DOI:10.15888/j.cnki.csa.009720

基于金字塔池化权值印记的训练后混合精度量化算法

Post-training Mixed-accuracy Quantization Algorithm Based on Pyramid-pooled Weight Imprinting

张瑞轩 赵宇峰 徐飞 禹婷婷 张乐怡
计算机系统应用2024,Vol.33Issue(12) :161-169.DOI:10.15888/j.cnki.csa.009720

基于金字塔池化权值印记的训练后混合精度量化算法

Post-training Mixed-accuracy Quantization Algorithm Based on Pyramid-pooled Weight Imprinting

张瑞轩 1赵宇峰 1徐飞 1禹婷婷 1张乐怡1
扫码查看

作者信息

  • 1. 西安工业大学计算机科学与工程学院,西安 710021
  • 折叠

摘要

模型量化方法现已广泛应用于深度神经网络模型快速推理和部署中.由于训练后量化重新训练所需时间少,性能损失小而备受研究人员关注,但现有训练后量化方法在量化过程中大多以理论假设或是固定分配网络层的比特位宽,导致量化后的网络会出现显著的性能损失,尤其是在低位情况下.为了提升训练后量化网络模型的精度,本文提出一种训练后混合精度量化方法(MSQ),该方法通过在网络模型每一层后插入一个融合了金字塔池化模块和权值印记技术的任务预测器模块,来对网络每一层进行准确度估计,从而评估每一层网络的重要性,根据重要性评估来确定每一层的量化比特位宽.实验表明,本文所提出的MSQ算法在多个流行的网络架构上都优于现有的一些混合精度量化方法,量化后的网络模型在边缘硬件设备上测试性能更好,延迟更低.

Abstract

Model quantization is widely used for fast inference and deployment of deep neural network models.Post-training quantization has attracted much attention from researchers due to its reduced retraining time and low performance loss.However,most existing post-training quantization methods rely on theoretical assumptions or use fixed bit-width allocations for network layers during the quantization process,which results in significant performance loss in the quantized network,especially in low-bit scenarios.To improve the accuracy of post-training quantized network models,this study proposes a novel post-training mixed-accuracy quantization method(MSQ).This method estimates the accuracy of each layer of the network by inserting a task predictor module,which incorporates the pyramid pooling module and weight imprinting,after each layer of the network model.With the estimations,it assesses the importance of each layer of the network and determines the quantization bit-width of each layer based on the assessment.Experiments show that the MSQ algorithm proposed in this study outperforms some existing mixed-accuracy quantization methods on several popular network architectures,and the quantized network model tested on edge hardware devices shows better performance and lower latency.

关键词

模型量化/混合精度量化/金字塔池化/权值印记/比特位宽分配

Key words

model quantization/mixed-accuracy quantization/pyramid pooling/weight imprinting/bit-width allocation

引用本文复制引用

出版年

2024
计算机系统应用
中国科学院软件研究所

计算机系统应用

CSTPCD
影响因子:0.449
ISSN:1003-3254
段落导航相关论文