BEV-radar:毫米波雷达-相机双向融合的三维目标检测

BEV-radar:bidirectional radar-camera fusion for 3D object detection

赵园 ¹张露 ²邓家俊 ³张燕咏¹

扫码查看

作者信息

1. 中国科学技术大学计算机科学与技术学院,安徽合肥 230027
2. 合肥综合性国家科学中心人工智能研究院,安徽合肥 230088
3. 悉尼大学电子工程系,澳大利亚南威尔士州 2006
折叠

摘要

在自动驾驶场景下的 3D目标检测任务中,探索毫米波雷达数据作为RGB图像输入的补充正成为多模态融合的新兴趋势.然而,现有的毫米波雷达-相机融合方法高度依赖于相机的一阶段检测结果,导致整体性能不够理想.本文提供了一种不依赖于相机检测结果的鸟瞰图下双向融合方法(BEV-radar).对于来自不同域的两个模态的特征,BEV-radar设计了一个双向的基于注意力的融合策略.具体地,以基于BEV的 3D目标检测方法为基础,我们的方法使用双向转换器嵌入来自两种模态的信息,并根据后续的卷积块强制执行局部空间关系.嵌入特征后,BEV特征在 3D对象预测头中解码.我们在nuScenes数据集上评估了我们的方法,实现了 48.2 mAP和 57.6 NDS.结果显示,与仅使用相机的基础模型相比,不仅在精度上有所提升,特别地,速度预测误差项有了相当大的改进.代码开源于https://github.com/Etah0409/BEV-Radar.

Abstract

Exploring millimeter wave radar data as complementary to RGB images for ameliorating 3D object detection has become an emerging trend for autonomous driving systems.However,existing radar-camera fusion methods are highly dependent on the prior camera detection results,rendering the overall performance unsatisfactory.In this paper,we pro-pose a bidirectional fusion scheme in the bird-eye view(BEV-radar),which is independent of prior camera detection res-ults.Leveraging features from both modalities,our method designs a bidirectional attention-based fusion strategy.Spe-cifically,following BEV-based 3D detection methods,our method engages a bidirectional transformer to embed informa-tion from both modalities and enforces the local spatial relationship according to subsequent convolution blocks.After em-bedding the features,the BEV features are decoded in the 3D object prediction head.We evaluate our method on the nuS-cenes dataset,achieving 48.2 mAP and 57.6 NDS.The result shows considerable improvements compared to the camera-only baseline,especially in terms of velocity prediction.The code is available at https://github.com/Etah0409/BEV-Radar.

关键词

三维目标检测/传感器融合/毫米波雷达

Key words

3D object detection/sensor fusion/millimeter wave radar

引用本文复制引用

出版年

2024

中国科学技术大学学报

中国科学技术大学

中国科学技术大学学报

CSTPCD北大核心

影响因子：0.421

ISSN：0253-2778

参考文献量34

段落导航