Hybrid Optimization Design of High Performance YOLOv3-tiny Embedded Hardware Accelerator
To solve the problem that the deployment of neural network in embedded devices is con-strained by algorithm complexity,execution speed and hardware resources,a high performance YOLOv3-tiny network hardware accelerator was designed based on Zynq heterogeneous platform.In terms of algorithm optimization,the convolutional layer and batch normalization layer were fused,and the 8 bit quantization algorithm was used to simplify the algorithm process.In the accelerator architecture design,a dynamically configurable inter-layer pipeline and an efficient data transmission scheme were designed to shorten the inference time and reduce the consumption of storage resources.In the aspect of network for-ward inference,for convolution calculation,an 8-channel parallel pipeline convolution module was designed based on the loop unrolling strategy.For pooling calculation,a step-by-step calculation strategy was used to achieve efficient processing of continuous data streams.For the upsampling computation,a 2x upsampling method based on data replication was proposed.Experimental results show that the forward inference time is 232 ms,the power consumption is only 2.29 W,the system operating frequency is 200 MHz,and the actual computing power of 23.97 GOPS is achieved.