基于NNC-EPNet的多模态融合3D目标检测

Multimodal fusion 3D object detection based on NNC-EPNet

扫码查看

原文链接

维普
万方数据

中文摘要：针对目前多模态融合3D目标检测方法难以有效融合目标对应图像特征的问题,通过引入近邻修正(Nearest Neighbor Correction,NNC)方法减轻目标点云稀疏和非目标点云的影响,提出一种多模态3D目标检测方法NNC-EPNet.首先,设计近邻修正模块NNC,利用增强后的近邻点云特征修正采样点云,减少点云数据中的噪声,增强目标点云特征,从而更好地融合目标图像特征;其次,设计基于Transformer的多模态特征融合编码器(Mutil-Modal Fusion Transformer,MFT),采用交叉注意力机制融合图像特征和点云特征,并且引入点云注意力机制聚合全局上下文信息,以提升特征表达能力;最后,分别在自动驾驶标准数据集KITTI和Waymo上进行对比实验.实验结果表明:NNC-EPNet方法在KITTI数据集上的平均精度均值达到 84.47%,与基线算法相比,在容易、中等和困难3种难度场景下的检测精度分别提高了 2.00%、3.25%和5.68%;在Waymo数据集上的加权平均精度达到74.48%,与基线算法相比,提升了2.49%.研究结果证明了设计的两个模块NNC和MFT能够有效提升3D目标检测性能.

外文摘要：To address the challenge of ineffectively integrating target image features in current multi-modal 3D object detection methods,this study proposes a multimodal 3D object detection method,NNC-EPNet,by introducing a Nearest Neighbor Correction(NNC)method to mitigate the impact of sparse target point cloud and non-targeted point clouds.First,the NNC module is designed by using the enhanced features of neighboring point clouds to refine the sampled point clouds.This process re-duces noise in point cloud data and strengthens the features of target point clouds,facilitating better in-tegration of target image features.Second,a Multi-Modal Fusion Transformer(MFT)encoder is de-veloped,which uses cross-attention mechanisms to fuse image and point cloud features and introduces a point cloud attention mechanism to aggregate global contextual information,thereby enhancing fea-ture representation capabilities.Finally,comparative experiments are conducted on the standard au-tonomous driving datasets,namely KITTI and Waymo.Experimental results show that NNC-EPNet achieves an average detection accuracy of 84.47%on the KITTI dataset,with improvements of 2.00%,3.25%,and 5.68%in the easy,moderate,and hard scenarios,compared to the baseline algo-rithm.On the Waymo dataset,it achieves a weighted average accuracy of 74.48%,with improve-ments of 2.49%compared to the baseline algorithm.These results prove that the two designed mod-ules,NNC and MFT,can effectively improve the 3D object detection performance.

外文关键词：

3D object detectionmultimodalityfeature fusionpoint cloud correctionattention mechanism

作者：

冯霞、梁宇龙、卢敏、左海超

展开 >

作者单位：

中国民航大学民航智慧机场理论与系统重点实验室,天津 300300

中国民航大学科技创新研究院,天津 300300

中国民航大学计算机科学与技术学院,天津 300300

关键词：

3D目标检测多模态特征融合点云修正注意力机制

出版年：

2024

DOI：

10.11860/j.issn.1673-0291.20240003

北京交通大学学报

北京交通大学

北京交通大学学报

CSTPCD北大核心

影响因子：0.525

ISSN：1673-0291

年,卷(期)：2024.48(5)