计算机工程与科学2024,Vol.46Issue(9) :1547-1553.DOI:10.3969/j.issn.1007-130X.2024.09.004

适应于硬件部署的神经网络剪枝量化算法

A neural network pruning and quantization algorithm for hardware deployment

王鹏 张嘉诚 范毓洋
计算机工程与科学2024,Vol.46Issue(9) :1547-1553.DOI:10.3969/j.issn.1007-130X.2024.09.004

适应于硬件部署的神经网络剪枝量化算法

A neural network pruning and quantization algorithm for hardware deployment

王鹏 1张嘉诚 2范毓洋1
扫码查看

作者信息

  • 1. 中国民航大学民航航空器适航审定技术重点实验室,天津 300399;中国民航大学安全科学与工程学院,天津 300399
  • 2. 中国民航大学民航航空器适航审定技术重点实验室,天津 300399
  • 折叠

摘要

深度神经网络由于性能优异已经在图像识别、目标检测等领域广泛应用,然而其包含大量参数和巨大计算量,导致在需要低延时和低功耗的移动边缘端部署时困难.针对该问题,提出一种用移位加法代替乘法运算的压缩算法,通过对神经网络进行剪枝和量化将参数压缩至低比特.该算法在乘法资源有限的情况下降低了硬件部署难度,可满足移动边缘端低延时和低功耗的要求,提高运行效率.对ImageNet数据集经典神经网络进行了实验,结果表明神经网络的参数在压缩到4 bit的情况下,其准确率与全精度神经网络的基本一致,甚至在ResNet18、ResNet50和GoogleNet网络上的Top-1/Top-5准确率还分别提升了 0.38%/0.22%,0.35%/0.21%和1.14%/0.57%.对VGG16第8层卷积层进行实验,将其部署在Zynq7035上,结果表明,压缩后的网络在使用的DSP资源减少43%的情况下缩短了 51.1%的推理时间,并且减少了 46.7%的功耗.

Abstract

Due to their superior performance,deep neural networks have been widely applied in fields such as image recognition and object detection.However,they contain a large number of parameters and require immense computational power,posing challenges for deployment on mobile edge devices that re-quire low latency and low power consumption.To address this issue,a compression algorithm that re-places multiplication operations with bit-shifting and addition is proposed.This algorithm compresses neural network parameters to low bit-widths through pruning and quantization.This algorithm reduces the hardware deployment difficulty under limited multiplication resources,meets the requirements of low latency and low power consumption on mobile edge devices,and improves operational efficiency.Experiments conducted on classical neural networks with the ImageNet dataset revealed that when the neural network parameters were compressed to 4 bits,the accuracy remained essentially unchanged com-pared to the full-precision neural network.Furthermore,for ResNetl8,ResNet50,and GoogleNet,the Top-1/Top-5 accuracies even improved by0.38%/0.22%,0.35%/0.21%,and 1.14%/0.57%,respec-tively.When testing the eighth convolutional layer of VGG16 deployed on Zynq7035,the results showed that the compressed network reduced the inference time by 51.1%and power consumption by 46.7%,while using 43%fewer DSP resources.

关键词

深度神经网络/硬件/剪枝/量化/FPGA

Key words

deep neural networks/hardware/pruning/quantization/FPGA

引用本文复制引用

出版年

2024
计算机工程与科学
国防科学技术大学计算机学院

计算机工程与科学

CSTPCD北大核心
影响因子:0.787
ISSN:1007-130X
段落导航相关论文