Monocular 3D Detection Algorithm Based on Multi-scale Fusion and High-order Interaction
Due 3D target detection is a basic and challenging task in 3D scene understanding.The method based on monocular vision can be used as an economic alternative based on stereo or radar methods.An improved monocular 3D detection algorithm based on MonoDLE is proposed to optimize the accuracy loss caused by the deviation between size and shape and 3D position.Firstly,a general multi-scale pooled attention module is proposed,which is used to aggregate more fine multi-scale features and efficient context information.Secondly,in order to enhance the high-order spatial interaction ability of the model,a recursive gated convolution block composed of recursive gated convolution and GN regularization is proposed to replace the convolution layer of the sampling module on the baseline architecture and effectively improve the representation ability of the up-sampling module.The experimental results on the monocular 3D detection general data set KITTI show that after the ability of network aggregation is improved by multi-scale pooled attention module,the average detection rate index AP40 of the proposed algorithm is improved from 13.66 to 15.10 under the standard condition of 3D viewing angle and intersection-merge ratio greater than 0.70;after the recursive gated convolutional blocks enhance the high-order spatial interaction ability of the model,the average detection rate index AP40 of the proposed algorithm is increased from 15.10 to 15.53 again in the standard case of 3D viewing angle and intersection-union ratio greater than 0.7;under the synergistic action of the two modules,the average detection rate index AP40 of the proposed algorithm is also improved from 19.33 to 21.95 in the case of aerial view and the intersection ratio is greater than 0.70.
monocular 3D detectioncharacteristic pyramid poolingattention mechanismrecursive gated convolutiongrouping normali-zation