基于Zynq平台的低功耗人脸检测加速系统
Low-power Face Detection Acceleration System Based on the Zynq Platform
赵民 1徐胜 1韩路宇 1林志贤2
作者信息
- 1. 福州大学物理与信息工程学院,福州 350116;中国福建光电信息科学与技术创新实验室,福州 350116
- 2. 福州大学物理与信息工程学院,福州 350116;中国福建光电信息科学与技术创新实验室,福州 350116;福州大学先进制造学院,福建泉州 362200
- 折叠
摘要
基于CPU及GPU的卷积神经网络平台存在体积大、能耗高等问题,提出了一种基于Zynq平台的卷积神经网络人脸检测加速系统.该系统采用YOLOv3-Tiny算法,并利用Wider Face人脸数据集进行训练.为提高网络效率,采用层融合技术减小网络深度,加快检测速度;同时,采用8位整数量化策略,以降低内存访问量,减少资源消耗.通过利用ZynqXC7Z035芯片上FPGA端并行计算能力,设计出可重复利用的多通道卷积计算模块,实现DSP的重复递用.实验结果显示,所设计的加速系统实现了 9.5 FPS的实时推理速度,检测速度是intel i7-8700CPU的7.9倍,系统功耗仅为2.65 W,满足低功耗的性能需求.
Abstract
To address the issues of large size and high power consumption in CPU-and GPU-based convolutional neural network platforms,we designed and implemented a convolutional neural network-assisted face detection acceleration system based on the Zynq platform in this study.We adopted the YOLOv3-Tiny algorithm for the proposed system and used the WIDER FACE dataset for training.To improve the network efficiency,we utilized a layer-fusion technique for reducing the network depth and accelerating detection.Moreover,we employed an 8-bit integer quantization strategy to minimize memory access and resource consumption.We designed a reusable multichannel convolution computation module by leveraging the parallel computing capability of field-programmable gate arrays(FPGAs)on the ZynqXC7Z035 chip to reuse the digital signal processor(DSP).The experimental results showed that our designed acceleration system,which could achieve a real-time inference speed of 9.5 FPS,was 7.9 times faster than intel i7-8700CPU and consumed only 2.65 W of power,satisfying the performance requirement of low power consumption.
关键词
卷积神经网络/层融合/量化/多通道卷积/现场可编程门阵列Key words
convolutional neural network/layer fusion/quantization/multichannel convolution/FPGA引用本文复制引用
基金项目
国家重点研发计划项目(2021YFB3600603)
福建省自然科学基金项目(2020J01468)
出版年
2024