基于RISC-V的深度可分离卷积神经网络加速器

扫码查看

原文链接

万方数据

中文摘要：人工智能时代,RISC-V作为一种新兴的开源精简指令集架构,因其低功耗、模块化、开放性和灵活性等优势,使之成为一种能够适应不断发展的深度学习模型和算法的新平台.但是在硬件资源及功耗受限环境下,基础的RISC-V处理器架构无法满足卷积神经网络对高性能计算的需求.为了解决这一问题,本文设计了一个基于RISC-V的轻量化深度可分离卷积神经网络加速器,旨在弥补RISC-V处理器的卷积计算能力的不足.该加速器支持深度可分离卷积中的两个关键算子,即深度卷积和点卷积,并能够通过共享硬件结构提高资源利用效率.深度卷积计算流水线采用了高效的Winograd卷积算法,并使用2×2数据块组合拼接成4×4数据片的方式来减少传输数据冗余.同时,通过拓展RISC-V处理器端指令,使得加速器能够实现更灵活的配置和调用.实验结果表明,相较于基础的RISC-V处理器,调用加速器后的点卷积和深度卷积计算取得了显著的加速效果,其中点卷积加速了104.40倍,深度卷积加速了123.63倍.与此同时,加速器的性能功耗比达到了8.7GOPS/W.本文的RISC-V处理器结合加速器为资源受限环境下卷积神经网络的部署提供了一个高效可行的选择.

外文标题：Efficient Accelerator for Depthwise Separable Convolutional Neural Networks Based on RISC-V

外文摘要：In the era of artificial intelligence,RISC-V,as an emerging open-source Reduced Instruction Set Computing architecture,has become a new platform capable of adapting to evolving deep learning models and algorithms due to its advantages such as low power consumption,modularity,openness,and flexibility.However,in environments with constrained hardware resources and power,the basic RISC-V processor architecture falls short of meeting the high-performance computing demands of convolutional neural networks.To address this issue,this paper introduces a lightweight depthwise separable convolutional neural network accelerator based on RISC-V,aiming to compensate for the insufficient convolutional computation capabili-ties of RISC-V processors.The accelerator supports two key operators in depthwise separable convolution:depthwise convolution and pointwise convolution,and enhances resource utilization efficiency through shared hardware structures.The depthwise convolution computation pipeline employs an efficient Winograd convolution algorithm and reduces data redundancy by combining 2×2 data blocks into 4×4 data tiles.Additionally,by extending RISC-V instructions,the accelerator achieves more flexible configuration and invocation.Experimental results demonstrate significant acceleration in pointwise and depthwise convolution computations compared to the basic RISC-V processor,with a speedup of 104.40x for pointwise convolution and 123.63x for depthwise convolution.Meanwhile,the performance-to-power ratio of the accelerator reaches 8.7 GOPS/W.The combination of the RISC-V processor and the accelerator presented in this paper offers an efficient and viable choice for deploying convolutional neural networks in resource-constrained environments.

外文关键词：

neural networksdepthwise separable convolutionReduced Instruction Set Computer-VWinograd fast convolutionhardware acceleration

作者：

曹希彧、陈鑫、魏同权

展开 >

作者单位：

华东师范大学计算机科学与技术学院上海 200062

关键词：

神经网络深度可分离卷积 RISC-V Winograd快速卷积硬件加速

基金：

国家自然科学基金面上项目上海市市级科技重大专项上海市可信工业互联网软件协同创新中心项目资助

项目编号：

622721692021SHZDZX

出版年：

2024

DOI：

10.11897/SP.J.1016.2024.02536

计算机学报

中国计算机学会中国科学院计算技术研究所

计算机学报

CSTPCD北大核心

影响因子：3.18

ISSN：0254-4164

年,卷(期)：2024.47(11)