Multimodal 3D Object Detection Method Based on Pseudo Point Cloud Feature Enhancement
Environment perception is one of the key technologies to ensure the landing of autono-mous vehicle,and it is crucial to improve the safety and reliability of autonomous vehicle.Three dimensional(3D)object detection is a core task in environment perception technology.It aims at identifying and locating different objects in 3D space,and provides important information for the subsequent decision-making and action of autonomous vehicle.Point clouds and images are the most commonly used input data for 3D object detection task.However,point clouds are composed of irregularly distributed scattered points in 3D space,while images are composed of regularly distributed pixels in 2D space.Therefore,it is difficult to map and fuse irregularly distributed point clouds and regularly distributed image pixels in an effective way.In recent years,as a type of image information input in the form of point cloud,image pseudo point clouds have received widespread attention from researchers in this field.But 3D object detection methods based on image pseudo point clouds still have some issues.On the one hand,the feature extraction process for image pseudo point clouds is still relatively rough.On the other hand,the representation ability of Region of Interest(RoI)features of image pseudo point clouds is still poor.This paper conducted research on the above two issues,proposed fine-grained attention convolution and multi-scale group sparse convolution,respectively.Fine-grained attention convolution introduces the commonly used depth-wise separable convolution in regular image processing into irregular point cloud processing process.On this basis,channel attention mechanism and group attention mechanism are embedded for fine image pseudo point cloud feature extraction,and to enhance image pseudo point cloud features.This kind of convolution can enhance the fine-grained information of image pseudo point cloud features.Multi-scale group sparse convolution groups the image pseudo point cloud RoI features after grid pooling,performs differential feature learning on the grouped RoI features to obtain RoI features at different scales.This kind of convolution can enhance the representation ability of RoI features grouped from image pseudo point clouds.On this basis,this paper constructed SFD++multi-modal 3D object detection network,which introduced the proposed fine-grained attention convolution into the feature extraction process of image pseudo point cloud,and introduced multi-scale group sparse convolution into its RoI features learning process of image pseudo point cloud in SFD(Sparse Fuse Dense)3D object detection network.Experiments were carried out on the authoritative KITT1 autonomous driving dataset,and the results show that the constructed SFD++can process 8.33 frames of data per second.It achieved an average precision of 95.74%,88.80%,and 86.04%in easy,moderate and hard 3D car detection subsets,respectively.The achieved average precision is 0.15%,0.84%,and 0.58%higher than the current second-best 3D object detection network SFD.In addition,a series of ablation experiments and supplementary experiments were also conducted on KITTI datasets,the results verified the effectiveness of the proposed fine-grained attention convolution and multi-scale group sparse convolution,as well as the rationality of related parameter settings.
autonomous driving3D object detectionpseudo point cloudattention mechanismdepth-wise separable convolutiongroup convolution