一种基于自适应PoT量化的无乘法神经网络训练方法

扫码查看

原文链接

国家科技期刊平台
NETL
NSTL
万方数据

中文摘要：当前的深度神经网络的训练过程中需要包含大量的全精度乘累加(MAC)操作,导致神经网络模型的线性层(包含卷积层和全连接层)的计算过程所需的能耗占整体能耗的绝大部分,达90%以上.本文提出了一种自适应逐层缩放的量化训练方法,可支持在神经网络计算全流程(前向传播和后向传播)将全部线性层中的全精度乘法替换为 4位定点数加法计算和1 位异或运算.实验结果表明,上述方法在能耗和准确率方面都优于现有方法,可支撑在训练过程中减少达 95.8%的线性层能耗,在ImageNet数据集上的卷积神经网络和在WMT En-De任务上的Transformer网络得到小于1%的精度损失.

外文标题：Multiplication-free neural network training based on adaptive PoT quantization

外文摘要：The current deep neural network training process needs a large number of full-precision multiply-accumulate(MAC)operations,resulting in a situation that the energy consumption of the linear layers(including the convolu-tional layer and the fully connected layer)accounts for the vast majority of the overall energy consumption,reac-hing more than 90%.This work proposes an adaptive layer-wise scaling quantization training method,which can support the replacement of full-precision multiplication in all linear layers with 4-bit fixed-point addition and 1-bit XOR operation.The experimental results show that the above method is superior to the existing methods in terms of energy consumption and accuracy,and can reduce the energy consumption of linear layers by 95.8%in the train-ing process.The convolutional neural networks on ImageNet and the Transformer networks on WMT En-De achieve less than 1%accuracy loss.

外文关键词：

neural networkquantizationtraining accelerationlow energy consumption

作者：

刘畅、张蕊、支天

展开 >

作者单位：

中国科学院大学北京 100049

中国科学院计算技术研究所智能处理器研究中心北京 100190

中国科学院计算技术研究所处理器芯片全国重点实验室北京 100190

关键词：

神经网络量化训练加速低能耗

基金：

国家重点研发计划国家自然科学基金国家自然科学基金国家自然科学基金中国科学院稳定支持基础研究领域青年团队计划

项目编号：

2018AAA010330062102399U22A2028U20A20227YSBR-029

出版年：

2024

DOI：

10.3772/j.issn.1002-0470.2024.06.002

高技术通讯

中国科学技术信息研究所

高技术通讯

CSTPCD北大核心

影响因子：0.19

ISSN：1002-0470

年,卷(期)：2024.34(6)

参考文献量42