微电子学与计算机2024,Vol.41Issue(4) :104-111.DOI:10.19304/J.ISSN1000-7180.2023.0005

基于数据局部相似性的卷积神经网络加速器

Convolutional neural network accelerator based on local similarity of data

蔡元鹏 孙文浩 陈松
微电子学与计算机2024,Vol.41Issue(4) :104-111.DOI:10.19304/J.ISSN1000-7180.2023.0005

基于数据局部相似性的卷积神经网络加速器

Convolutional neural network accelerator based on local similarity of data

蔡元鹏 1孙文浩 1陈松1
扫码查看

作者信息

  • 1. 中国科学技术大学微电子学院,安徽合肥 230026
  • 折叠

摘要

为提高卷积神经网络的处理速度,使用零梯度近似处理的卷积方法(梯度卷积)来提高数据的复用率,减少计算量.以卷积核为单位对数据进行梯度计算,针对不同网络的不同层次采用灵活的梯度阈值计算策略,以合理复用相邻窗口的卷积结果.将其中关键的梯度处理模块和卷积计算部分在现场可编程门阵列(Field-Programmable Gate Array,FPGA)上进行实现,与脉动阵列相结合以提高资源利用率,并针对负载不均衡的问题设计出适合梯度卷积的数据流.基于YOLOv3模型和Pascal VOC数据集的目标检测实验中,在付出较小精度损失的前提下,软件端减少约23.2%的计算量,结合硬件加速比约为17.8%.

Abstract

In order to improve the processing speed of the convolutional neural network,we use the convolution method of zero-grad approximate treatment(grad convolution)to reduce the computation amount and improve the reuse rate of the data.The grad calculation of the data is performed in terms of the convolution kernel,and a flexible gradient threshold calculation strategy for different levels of different networks is adopted to rationally reuse the convolution results of adjacent windows.The key grad processing module and convolution calculation part are implemented on Field-Programmable Gate Array(FPGA),combined with pulsation array to improve resource utilization,and the data flow suitable for gradient convolution is designed for the problem of load imbalance.In the target detection experiment based on YOLOv3 model and Pascal VOC dataset,the software side reduced the computation by about 23.2%,and the combined hardware acceleration ratio was about 17.8%.

关键词

加速器/数据局部相似性/卷积神经网络/梯度卷积/现场可编程门阵列

Key words

accelerator/local similarity of data/convolutional neural network/grad convolution/field-programmable gate array(FPGA)

引用本文复制引用

基金项目

国家重点研发计划(2019YFB2204800)

国家自然科学基金(61931008)

出版年

2024
微电子学与计算机
中国航天科技集团公司第九研究院第七七一研究所

微电子学与计算机

CSTPCD
影响因子:0.431
ISSN:1000-7180
参考文献量7
段落导航相关论文