首页|基于多尺度特征的多模态激光雷达增强算法

基于多尺度特征的多模态激光雷达增强算法

扫码查看
激光雷达(LiDAR)通过扫描周围环境,获取测量数据构建三维(3D)点云来实现环境感知的功能,广泛应用于车辆环境感知任务.然而,激光雷达无法感知环境中的语义信息,一定程度上限制了其在3D目标检测中的效果.为增强激光雷达在复杂环境下的 3D目标检测效果,设计了一种基于多尺度特征的多模态融合激光雷达增强算法,并在Transformer框架下进行了一定的创新.在编码器中,语义感知聚合模块提取的多尺度语义特征用于跨模态特征融合,而解码器中的尺度自注意力与提案引导初始化使得预测过程更加高效.还设计了一种用于辅助预测框位置回归的三角损失函数,将预测框的回归位置利用三角几何约束限制在2D标签与3D标签之间,以得到更好的预测效果.在nuScenes 数据集上进行的实验证明了所提模型的有效性与鲁棒性.
Multimodal LiDAR Enhancement Algorithm Based on Multiscale Features
LiDAR is widely used to scan the surrounding environment,obtain measurement data,and construct a three-dimensional(3D)point cloud in vehicle environment perception tasks.However,it cannot perceive semantic information in the environment,which limits its effectiveness in 3D object detection.Consequently,in this study,we design a multi-modal fusion LiDAR-enhancement algorithm based on multiscale features and introduce some innovations under the Transformer framework to enhance the 3D object detection effect of LiDAR in complex environments.In the encoder,multiscale semantic features extracted by a semantic-aware aggregation module will be used for cross-modal feature fusion,whereas scale self-attention and proposal-guided initialization in the decoder will be used to make the prediction process more efficient.We also design a triangular loss function to improve the regression of the prediction box position,which restricts the regression position of the prediction box between 2D and 3D labels with triangular geometric constraints to obtain better prediction results.The experiments conducted on the nuScenes dataset have demonstrated the effectiveness and robustness of the proposed model.

LiDARmultimodal fusionTransformer3D object detectionbird's-eye view

罗一凯、何林远、马时平

展开 >

空军工程大学航空工程学院,陕西 西安 710038

激光雷达 多模态融合 Transformer 三维目标检测 鸟瞰图

2024

激光与光电子学进展
中国科学院上海光学精密机械研究所

激光与光电子学进展

CSTPCD北大核心
影响因子:1.153
ISSN:1006-4125
年,卷(期):2024.61(18)
  • 3