Multimodal fusion 3D object detection based on NNC-EPNet
To address the challenge of ineffectively integrating target image features in current multi-modal 3D object detection methods,this study proposes a multimodal 3D object detection method,NNC-EPNet,by introducing a Nearest Neighbor Correction(NNC)method to mitigate the impact of sparse target point cloud and non-targeted point clouds.First,the NNC module is designed by using the enhanced features of neighboring point clouds to refine the sampled point clouds.This process re-duces noise in point cloud data and strengthens the features of target point clouds,facilitating better in-tegration of target image features.Second,a Multi-Modal Fusion Transformer(MFT)encoder is de-veloped,which uses cross-attention mechanisms to fuse image and point cloud features and introduces a point cloud attention mechanism to aggregate global contextual information,thereby enhancing fea-ture representation capabilities.Finally,comparative experiments are conducted on the standard au-tonomous driving datasets,namely KITTI and Waymo.Experimental results show that NNC-EPNet achieves an average detection accuracy of 84.47%on the KITTI dataset,with improvements of 2.00%,3.25%,and 5.68%in the easy,moderate,and hard scenarios,compared to the baseline algo-rithm.On the Waymo dataset,it achieves a weighted average accuracy of 74.48%,with improve-ments of 2.49%compared to the baseline algorithm.These results prove that the two designed mod-ules,NNC and MFT,can effectively improve the 3D object detection performance.
3D object detectionmultimodalityfeature fusionpoint cloud correctionattention mechanism