首页|一种基于PYNQ的神经网络模型加速设计

一种基于PYNQ的神经网络模型加速设计

扫码查看
针对卷积神经网络存在运算量大、资源要求高的问题,本文提出一种易于在移动端低功耗嵌入式设备上布置的二值化神经网络(Binary Neural Network,BNN)图像分类模型,并提供了其在ARM(Advanced RISC Machines)+FPGA(Field Pro-grammable Gate Array)异构系统上的硬件加速设计。通过将卷积的累乘加运算转化为简单的同或运算(Exclusive NOR,XNOR)和位计数运算(population count,popcount),降低了运算复杂度和片上资源要求;利用数据复用、流水线设计和并行计算提升整体运算速度;针对CIFAR-10 数据集进行图像分类识别,利用Vivado HLS工具在 FPGA 平台上完成该网络模型的部署。在PYNQ-Z2 平台上进行测试的实验结果显示,在 100 MHz工作频率下,部署在FPGA端的网络模型对任意尺寸的图像输入经过PS(Processing System)端裁剪后整体处理速度可达约 631 FPS,运行总时间仅约 1。58 ms。
A neural network model acceleration design based on PYNQ
Aiming at the problems of large computational complexity,time-consuming,and high resource requirements of convolutional neural network(CNN),this paper proposes a design scheme of binary neural network(BNN)image classification model running on embedded platforms with limited resources and power consumption in mobile terminals and designs a hardware acceleration design for its implementation on an ARM+FPGA platform.By converting the convolution multiply-accumulate operation into XNOR logic and popcount operations,the computational complexity and on-chip resource requirements are reduced.Data multiplexing,pipeline design,and parallel calculation were utilized to increase overall computation speed.Taking image recognition under the CIFAR-10 data set as an example,We use VIVADO HLS tool to complete the deployment of convolutional neural network model on FPGA platform.The test results on the PYNQ-Z2 platform show that the network model deployed on the FPGA side achieves a processing speed of approximately 631 FPS at a working frequency of 100 MHz,total runtime is only about 1.58 ms for image inputs of any size,after cropping on the processing system(PS)side.

FPGAImage classificationneural networkhard-ware accelerator

魏行健、孙泽宇、王正斌

展开 >

南京邮电大学 电子与光学工程学院、柔性电子(未来技术学院),南京 210023

射频集成与微组装技术国家地方联合工程实验室,南京 210023

FPGA 图像分类 神经网络 硬件加速设计

2025

智能计算机与应用
哈尔滨工业大学

智能计算机与应用

影响因子:0.357
ISSN:2095-2163
年,卷(期):2025.15(1)