首页|基于数据局部相似性的卷积神经网络加速器

基于数据局部相似性的卷积神经网络加速器

扫码查看
为提高卷积神经网络的处理速度,使用零梯度近似处理的卷积方法(梯度卷积)来提高数据的复用率,减少计算量.以卷积核为单位对数据进行梯度计算,针对不同网络的不同层次采用灵活的梯度阈值计算策略,以合理复用相邻窗口的卷积结果.将其中关键的梯度处理模块和卷积计算部分在现场可编程门阵列(Field-Programmable Gate Array,FPGA)上进行实现,与脉动阵列相结合以提高资源利用率,并针对负载不均衡的问题设计出适合梯度卷积的数据流.基于YOLOv3模型和Pascal VOC数据集的目标检测实验中,在付出较小精度损失的前提下,软件端减少约23.2%的计算量,结合硬件加速比约为17.8%.
Convolutional neural network accelerator based on local similarity of data
In order to improve the processing speed of the convolutional neural network,we use the convolution method of zero-grad approximate treatment(grad convolution)to reduce the computation amount and improve the reuse rate of the data.The grad calculation of the data is performed in terms of the convolution kernel,and a flexible gradient threshold calculation strategy for different levels of different networks is adopted to rationally reuse the convolution results of adjacent windows.The key grad processing module and convolution calculation part are implemented on Field-Programmable Gate Array(FPGA),combined with pulsation array to improve resource utilization,and the data flow suitable for gradient convolution is designed for the problem of load imbalance.In the target detection experiment based on YOLOv3 model and Pascal VOC dataset,the software side reduced the computation by about 23.2%,and the combined hardware acceleration ratio was about 17.8%.

acceleratorlocal similarity of dataconvolutional neural networkgrad convolutionfield-programmable gate array(FPGA)

蔡元鹏、孙文浩、陈松

展开 >

中国科学技术大学微电子学院,安徽合肥 230026

加速器 数据局部相似性 卷积神经网络 梯度卷积 现场可编程门阵列

国家重点研发计划国家自然科学基金

2019YFB220480061931008

2024

微电子学与计算机
中国航天科技集团公司第九研究院第七七一研究所

微电子学与计算机

CSTPCD
影响因子:0.431
ISSN:1000-7180
年,卷(期):2024.41(4)
  • 7