计算机工程与科学2024,Vol.46Issue(1) :12-20.DOI:10.3969/j.issn.1007-130X.2024.01.002

基于异构平台的卷积神经网络加速系统设计

Design of convolutional neural network acceleration system based on heterogeneous platform

秦文强 吴仲城 张俊 李芳
计算机工程与科学2024,Vol.46Issue(1) :12-20.DOI:10.3969/j.issn.1007-130X.2024.01.002

基于异构平台的卷积神经网络加速系统设计

Design of convolutional neural network acceleration system based on heterogeneous platform

秦文强 1吴仲城 2张俊 2李芳2
扫码查看

作者信息

  • 1. 安徽大学物质科学与信息技术研究院,安徽合肥 230601
  • 2. 中国科学院合肥物质科学研究院强磁场科学中心,安徽合肥 230031;强磁场安徽省实验室,安徽合肥 230031
  • 折叠

摘要

在计算和存储资源受限的嵌入式设备上部署卷积神经网络,存在执行速度慢、计算效率低、功耗高的问题.提出了一种基于异构平台的新型卷积神经网络加速架构,设计并实现了基于MobileNet的轻量化卷积神经网络加速系统.首先,为降低硬件资源消耗以及数据传输成本,采用动态定点数量化和批标准化融合的设计方法,对网络模型进行了优化,并降低了加速系统的硬件设计复杂度;其次,通过实现卷积分块、并行卷积计算、数据流优化,有效提高了卷积运算效率和系统吞吐率.在PYNQ-Z2平台上的实验结果表明,此加速系统实现的MobileNet网络推理加速方案对单幅图像的识别时间为0.18 s,系统功耗为2.62 W,相较于ARM单核处理器加速效果提升了 128倍.

Abstract

Deploying convolutional neural networks(CNN)on embedded devices with limited com-puting and storage resources poses challenges such as slow execution speed,low computational efficien-cy,and high power consumption.This paper proposes a novel CNN acceleration architecture based on a heterogeneous platform,and designs and implements a lightweight CNN acceleration system based on MobileNet.Firstly,to reduce hardware resource consumption and data transmission costs,a design method combining dynamic fixed-point quantization and batch normalization fusion is employed to opti-mize the network model and reduce the hardware design complexity of the acceleration system.Second-ly,by implementing convolutional block partitioning,parallel convolutional computation,and data flow optimization,the efficiency of convolutional operations and system throughput are effectively improved.Experimental results on the PYNQ-Z2 platform demonstrate that the MobileNet network inference ac-celeration scheme implemented by this acceleration system achieves a recognition time of 0.18 seconds per image and a system power consumption of 2.62 watts,representing a 128-fold improvement in acce-leration performance compared to an ARM single-core processor.

关键词

现场可编程门阵列(FPGA)/Vivado高层次综合/卷积神经网络/异构平台/硬件加速

Key words

field programmable gate array(FPGA)/Vivado high level synthesis/convolutional neu-ral network/heterogeneous platform/hardware acceleration

引用本文复制引用

基金项目

中国科学院合肥大科学中心重点研发项目(2019HSC-KPRD003)

合肥综合性国家科学中心项目(QGCYY04)

出版年

2024
计算机工程与科学
国防科学技术大学计算机学院

计算机工程与科学

CSTPCD北大核心
影响因子:0.787
ISSN:1007-130X
参考文献量3
段落导航相关论文