首页|基于交叉注意力机制的多模态感知融合方法

基于交叉注意力机制的多模态感知融合方法

扫码查看
针对智能汽车道路目标检测任务中单一传感器感知能力有限、多传感器后融合处理复杂等问题,提出了一种基于Transformer交叉注意力机制的多模态感知融合方法.首先,利用交叉注意力机制能较好地融合多模态信息的优势,搭建了基于深度学习方式的端到端融合感知网络,用以接收视觉与点云检测网络的输出,并进行后融合处理.其次,对点云检测网络的三维目标信息进行高召回处理,与视觉图像检测器输出的道路目标信息一同作为网络的输入.最后,通过网络实现二维目标信息向三维信息的融合,输出对三维目标检测信息的修正,从而得到准确度更高的后融合检测信息.在KITTI公开数据集上的验证指标表明,通过所提融合方法引入二维检测信息后,相比较PointPillars、PointRCNN、PV-RCNN及CenterPoint四种基准方法,对车辆、骑行人、行人3种类别的综合平均提升分别为7.07%、2.82%、2.46%、1.60%.通过与基于规则的后融合方法对比,所提融合网络在行人和骑行人中等、困难样本检测上,分别有平均1.88%与4.90%的提升.进一步表明所提方法具有更强的适应性与泛化能力.最后,进行了实车试验平台的搭建及算法验证,选取实车试验场景进行可视化定性分析,在实际道路场景下验证了所提检测方法与网络模型.
Multi-modal Perception Fusion Method Based on Cross Attention
To address the problems related to the limited perception ability of single sensors and complex late-fusion processing of multi sensors in intelligent vehicle road target detection tasks,this study proposes a multi-modal perception fusion method based on Transformer Cross Attention.First,by utilizing the advantage of cross-attention,which can effectively fuse multimodal information,an end-to-end fusion perception network was constructed to receive the output of visual and point cloud detection networks and perform post-fusion processing.Second,the 3D target detection of the point cloud detection network was subjected to high-recall processing,which was used as an input to the network,along with the target detection output by the visual detector.Finally,the fusion of 2D target information with 3D information was achieved through the network,and the correction of the 3D target detection information was output,yielding more accurate post-fusion detection information.The validation metrics on the KITTI public dataset showed that after introducing 2D detection information through the fusion method proposed in this study,compared with the four benchmark methods,PointPillars,PointRCNN,PV-RCNN,and CenterPoint,the comprehensive average improvements for the three categories of vehicles,cyclists,and pedestrians were 7.07%,2.82%,2.46%,and 1.60%,respectively.Compared with rule-based post-fusion methods,the fusion network proposed in this study obtained an average improvement of 1.88%and 4.90%in detecting medium-and highly-difficult samples for pedestrians and cyclists,respectively,indicating that the proposed method has a stronger adaptability and generalization ability.Finally,a real vehicle test platform was constructed,and algorithm validation was performed.A visual qualitative analysis was conducted on selected real vehicle test scenarios,and the detection method and network model proposed in this study were validated under actual road scenarios.

automotive engineeringmultimodal fusioncross-attention3D target detectionlate-fusioninformation correction

张炳力、潘泽昊、姜俊昭、张成标、王怿昕、杨程磊

展开 >

合肥工业大学汽车与交通工程学院,安徽合肥 230041

安徽省智能汽车工程实验室,安徽合肥 230009

汽车工程 多模态融合 交叉注意力机制 三维目标检测 后融合 信息修正

长三角科技创新共同体联合攻关专项安徽省科技重大专项安徽省发改委新能源汽车产业创新发展项目合肥市关键共性技术研发和重大科技成果工程化项目中国声谷创新发展关键核心技术揭榜挂帅攻关项目

2022CSJGG1501202203a05020008wfgcyh20214392021CG0032108-340161-04-01-727575

2024

中国公路学报
中国公路学会

中国公路学报

CSTPCD北大核心
影响因子:1.607
ISSN:1001-7372
年,卷(期):2024.37(3)
  • 30