计算机辅助设计与图形学学报2024,Vol.36Issue(5) :734-749.DOI:10.3724/SP.J.1089.2024.19862

图像语义特征引导与点云跨模态融合的三维目标检测方法

3D Object Detection Method with Image Semantic Feature Guidance and Cross-Modal Fusion of Point Cloud

李辉 王俊印 程远志 刘健 赵国伟 陈双敏
计算机辅助设计与图形学学报2024,Vol.36Issue(5) :734-749.DOI:10.3724/SP.J.1089.2024.19862

图像语义特征引导与点云跨模态融合的三维目标检测方法

3D Object Detection Method with Image Semantic Feature Guidance and Cross-Modal Fusion of Point Cloud

李辉 1王俊印 1程远志 2刘健 3赵国伟 1陈双敏1
扫码查看

作者信息

  • 1. 青岛科技大学信息科学技术学院 青岛 266061
  • 2. 哈尔滨工业大学计算机学部 哈尔滨 150001
  • 3. 南开大学计算机学院 天津 300071
  • 折叠

摘要

受到场景的复杂性和目标尺度变化、遮挡等影响,三维目标检测仍面临着诸多挑战.虽然跨模态特征融合图像和激光点云信息能够有效地提升三维目标检测性能,但在融合效果和检测性能上仍有待提升,为此,提出图像语义特征引导与点云跨模态融合的三维目标检测方法.首先设计图像语义特征学习网络,采用双分支自注意力并行计算方式,实现全局语义特征增强,降低目标错误分类;然后提出图像语义特征引导的局部融合模块,采用元素级数据拼接将检索的图像局部语义特征引导融合点云数据,更好地解决跨模态信息融合存在的语义对齐问题;提出多尺度再融合网络,设计融合特征与激光雷达点云交互模块,学习融合特征和不同分辨率特征间的再融合,提高网络的检测性能;最后采用4种任务损失实现anchor-free的三维目标检测.在KITTI和nuScenes数据集中与其他方法进行对比,针对三维目标检测准确率达87.15%,并且实验结果表明,文中方法优于对比方法,具有更优的三维检测性能.

Abstract

Due to the complexity of scenes,the influence of object scale changes and occlusions etc.,object de-tection still face many challenges.Cross-modal feature fusion of image and laser point cloud information can ef-fectively improve the performance of 3D object detection,but the fusion effect and detection performance still need to be improved.Therefore,this paper first designs an image semantic feature learning network,which adopts a position and channel dual-branch self-attention parallel computing method,achieves global semantic enhance-ment,to reduce target misclassification.Secondly,a local semantic fusion module with image semantic feature guidance is proposed,which uses element-level data splicing to guide and fuse point cloud data with the local semantic features of the retrieved images,so as to better solve the problem of semantic alignment in cross-modal information fusion.A multi-scale re-fusion network is proposed,and the interaction module between the fusion features and LiDAR is designed to learn multi-scale connections in fusion features and re-fusion between features of different resolutions,so as to improve the detection performance.Finally,four task losses are adopted to per-form anchor-free 3D multi-object detector.Comparing with other methods in KITTI and nuScenes datasets,the detection accuracy for 3D objects is 87.15%,and the experimental results show that the method in this paper out-performs the comparison methods and has better 3D detection performance.

关键词

三维目标检测/跨模态/语义特征/点云/无锚

Key words

3D object detection/cross-modal/semantic feature/point cloud/anchor-free

引用本文复制引用

基金项目

国家自然科学基金(62002190)

国家自然科学基金(61702295)

国家重点研发计划(2023YFF0612102)

山东省自然科学基金(ZR2020MF036)

出版年

2024
计算机辅助设计与图形学学报
中国计算机学会

计算机辅助设计与图形学学报

CSTPCDCSCD北大核心
影响因子:0.892
ISSN:1003-9775
参考文献量5
段落导航相关论文