首页|面向CNN卷积层硬件的计算资源优化设计

面向CNN卷积层硬件的计算资源优化设计

扫码查看
传统卷积神经网络(Convolutional Neural Network,CNN)专用加速器在实现卷积层算子重构、数据复用和计算资源复用时,会产生硬件资源利用率较低的问题.对此设计了一种基于动态寄存器堆和可重构PE阵列相结合的硬件架构,通过优化数据流使得各PE单元负载均衡,进而提高卷积层计算资源的利用率.可灵活部署0~11大小和1~10步长的奇数卷积核,支持多通道并行卷积、输入数据复用等操作.设计使用verilog硬件描述语言实现,通过创建UVM环境进行功能性验证.实验表明:在加速AlexNet模型的卷积层时,峰值算力的吞吐率相比于相关研究提高了 9.5%~64.3%,在映射5种经典神经网络里不同尺寸大小和步长的卷积核时,PE单元的平均利用率相比于相关研究提高了 4%~11%.
Optimal design of computing resources for CNN convolution layer hardware
The traditional Convolutional Neural Network(CNN)dedicated accelerator will produce the low hardware resource utilization problem when realizing the convolution layer operator reconstruction,data multiplexing and computational resource reuse.A hardware architecture based on the combination of dynamic Register file and reconfigurable PE array is designed to balance the load of each PE unit by optimizing the data stream,thus improving the utilization of computing resources in the convolution layer.It can flexibly deploy odd convolution kernel with 0 to 11 size and 1 to 10 step length,and support multi-channel parallel convolution and input data multiplexing operations.The design is implemented using verilog hardware description language,and functional verification is carried out by creating UVM environment.The experiments show that when accelerating the convolutional layer of the AlexNet model,the throughput of peak computing power is increased by 9.5%to 64.3%compared with relevant studies.When mapping convolutional kernels of different sizes and steps in five classical neural networks,the average utilization rate of PE units is increased by 4%to 11%compared with relevant studies.

reconfigurable PEdynamic register heapflexibilityresource utilization

王彬燏、杨志家、谢闯、连莲、王颖

展开 >

沈阳化工大学信息工程学院,辽宁沈阳 110142

中国科学院网络化控制系统重点实验室,辽宁沈阳 110016

中国科学院沈阳自动化研究所,辽宁沈阳 110016

可重构PE 动态寄存器堆 灵活性 资源利用率

国家重点研发计划

2022YFB3204501

2024

微电子学与计算机
中国航天科技集团公司第九研究院第七七一研究所

微电子学与计算机

CSTPCD
影响因子:0.431
ISSN:1000-7180
年,卷(期):2024.41(7)
  • 2