Optimized object detection network design and FPGA hardware implementation
Addressing the challenge of target detection algorithms being constrained by increasingly stringent hardware power and storage requirements,which pose significant difficulties in deploying on miniature devices,this paper propoded a dedicated accelerator solution for deep learning models based on field programmable gate arrays(FPGA)to achieve edge deployment of target detection.By optimizing the convolutional operators of the original model and performing pruning and quantization,the parameter count was reduced by 52%.Experiments conducted on the MLK-F20-CM02-3EG development board found that the specialized accelerator achieved a theoretical peak performance of 407 GOPS and an actual performance of 328 GOPS,with a digital signal processor(DSP)utilization rate of 64%,and the power consumption on edge devices was 98%lower compared to large GPU platforms.
object detectionfield programmable gate array accelerationconvolutional operator optimizationprunequantizationspecially designed accelerator