针对点云稀疏性和单模态数据信息不足导致远小物体检测困难的问题,提出了一种双注意力机制的多模态数据融合的三维目标检测算法.首先,设计了一种体素多邻域特征提取器,扩大体素感受野,融合体素多个上下文信息,以提高体素特征对物体空间结构和语义信息的表征能力及提高特征鲁棒性;其次,提取了体素的图片多层语义特征,底层结构特征和高层语义特征分别保留目标位置信息和语义信息,以增强体素特征;最后,设计了一种多模态特征融合,使用通道注意力自适应融合不同模态特征,使用体素注意力增强有效目标物体特征表达、抑制无用背景物体特征表达.在 KITTI(Karlsruhe Institute of Technology and Toyota Technological Institute)数据集上的实验结果表明,在远小物体检测上,所提方法较现有多个主流单模态方法和多模态方法取得了较大的性能提升.
A 3D object detection based on multi-modal data fusion with dual attention mechanism
Aiming at the difficulty of detecting far and small objects due to the sparseness of point cloud and insufficient information of single-modal data,a dual attention-based multi-modal fusion net(DAMFNet)algorithm for 3D target detection is proposed.Firstly,a voxel multi-neighborhood feature extractor is designed to expand the voxel receptive field and fuse multiple context information of voxels,so as to improve the ability of voxel features to represent the spatial structure and semantic information of objects,and improve feature robustness.Secondly,the multi-layer semantic features of voxel images are extracted,and the target location information and semantic information are retained in the bottom structural features and high-level semantic features respectively to enhance the voxel features.Finally,a multi-modal feature fusion is designed,which uses channel attention to adaptively fuse different modal features,and uses voxel attention to enhance the feature expression of effective target objects and suppress the feature expression of useless background objects.The experimental results on the KITTI(Karlsruhe Institute of Technology and Toyota Technological Institute)dataset show that,in the detection of far and small objects,the proposed method has achieved a greater performance improvement than many mainstream single-modal and multi-modal methods.
object detectionmulti-modal fusionattention mechanismmulti-neighborhood features