首页|YOLO目标检测后处理算法的优化和硬件加速

YOLO目标检测后处理算法的优化和硬件加速

扫码查看
YOLO目标检测网络系列因具有高精度低延时的特点而得到广泛运用,但如何加速其后处理仍未得到充分研究.利用YOLO计算特点,优化了后处理算法:(1)融合detect层和后处理计算过程,通过将置信度阈值判断移至detect层计算前,避免无效计算和通信;(2)结合模型量化,实现基于脉动阵列的后处理硬件加速.实验表明:YOLOv3、YOLOv5 的 detect 层卷积计算量减少了 87.3%~99.9%;加速硬件设计在 Virtex Ultrascale+VCU112 上实现,100MHz时钟频率下,YOLOv3的detect层与后处理计算相较优化前加速比达到7.2~9.3,在3 000选框中筛选5个最佳选框条件下延时1 736 μs.相比现有工作,本文的detect层与后处理计算速度提升了 4.7~5.0倍,后处理所需FF资源仅为9.9%~10.5%.较后处理优化前,稀疏化的YOLOv3网络整体推理速度提升1.2%~1.3%.
Algorithm optimization and hardware acceleration for YOLO post processing
YOLO object detection network series have been widely adopted because of its high precision and low latency,but how to accelerate their post processing is not fully studied.Utilizing the characteristics of YOLO,the post processing algorithm is optimized:(1)the detect layer and post processing are merged through threshold judgement in advance,thus redundant computation and communication are avoided;(2)based on model quantization and systolic array,hardware acceleration for post processing is realized.Experiments prove that the convolution of detect layer of YOLOv3 and YOLOv5 is reduced by 87.3%-99.9%;the hardware design is implemented on the Virtex Ultrascale+VCU112 with 100 MHz clock frequency.Compared with traditional computation process,the speedup of detection layer and post processing reaches 7.2-9.3,and it costs 1 736 ps to select 5 best boxes out of 3 000 candidates.We have an edge over previous works for 4.7-5.0 speedup of detect layer and post processing while only 9.9%-10.5%FF are used in post processing.The optimization improves the overall inference speed of sparse YOLOv3 by 1.2%-1.3%.

YOLOobject detectionpostprocessinghardware accelerationFPGA

邹知炜、孙文浩、陈松

展开 >

中国科学技术大学微电子学院,安徽合肥 230026

YOLO 目标检测 后处理 硬件加速 FPGA

国家重点研发计划国家自然科学基金

2019YFB220480061931008

2024

微电子学与计算机
中国航天科技集团公司第九研究院第七七一研究所

微电子学与计算机

CSTPCD
影响因子:0.431
ISSN:1000-7180
年,卷(期):2024.41(4)
  • 11