首页|面向微控制器的卷积神经网络加速器设计

面向微控制器的卷积神经网络加速器设计

扫码查看
针对目前嵌入式微控制器的性能难以满足实时图像识别任务的问题,提出一种适用于微控制器的卷积神经网络加速器.该加速器在卷积层设计了无阻塞的行并行乘法-加法树结构,获得了更高的硬件利用率;为了满足行并行的数据吞吐量,设计了卷积专用SRAM存储器.加速器将池化和激活单元融入数据通路,有效减少数据重复存取带来的时间开销.FPGA原型验证表明加速器的性能达到 92.2 GOPS@100 MHz;基于 TSMC 130 nm 工艺节点进行逻辑综合,加速器的动态功耗为33 mW,面积为 90 764.2 μm2,能效比高达 2 793 GOPS/W,比FPGA加速器方案提高了约 100 倍.该加速器低功耗、低成本的特性,有利于实现嵌入式系统在目标检测、人脸识别等机器视觉领域的广泛应用.
Design of Convolutional Neural Network Accelerator for Microcontroller
Aiming at the problem that the performance of embedded microcontroller is difficult to meet the task of real-time image recog-nition,a convolutional neural network accelerator suitable for microcontroller is proposed.The accelerator has a non blocking row paral-lel multiplier adder unit structure in the convolutional layer.It has higher hardware utilization.In order to meet the throughput of row parallel data,a special convolution SRAM memory is designed.The accelerator integrates pooling and activation units into the data path,effectively reducing the time overhead caused by repeated data access.Through FPGA prototype verification,the performance of the accelerator can reach 92.2 GOPS@100 MHz.The accelerator is synthesized based on TSMC 130 nm process.The dynamic power consumption of the accelerator is 33 mW,the area is 90 764.2 μm2,and the energy efficiency ratio is 2 793 GOPS/W,which is about a hundred times higher than that of FPGA accelerator.The accelerator has the characteristics of low power and cost,which is conducive to the wide application of embedded systems in the field of machine vision,such as object detection,face recognition and so on.

convolutional neural networkparallel computingpipelinehardware acceleratorapplication specific integrated circuit

乔建华、吴言、栗亚宁、雷光政

展开 >

太原科技大学电子信息工程学院,山西 太原 030024

卷积神经网络 并行计算 流水线 硬件加速器 专用集成电路

山西省研究生教育改革研究课题项目

2021YJJG247

2024

电子器件
东南大学

电子器件

CSTPCD
影响因子:0.569
ISSN:1005-9490
年,卷(期):2024.47(1)
  • 1
  • 16