基于多尺度特征的多模态激光雷达增强算法

Multimodal LiDAR Enhancement Algorithm Based on Multiscale Features

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：激光雷达(LiDAR)通过扫描周围环境,获取测量数据构建三维(3D)点云来实现环境感知的功能,广泛应用于车辆环境感知任务.然而,激光雷达无法感知环境中的语义信息,一定程度上限制了其在3D目标检测中的效果.为增强激光雷达在复杂环境下的 3D目标检测效果,设计了一种基于多尺度特征的多模态融合激光雷达增强算法,并在Transformer框架下进行了一定的创新.在编码器中,语义感知聚合模块提取的多尺度语义特征用于跨模态特征融合,而解码器中的尺度自注意力与提案引导初始化使得预测过程更加高效.还设计了一种用于辅助预测框位置回归的三角损失函数,将预测框的回归位置利用三角几何约束限制在2D标签与3D标签之间,以得到更好的预测效果.在nuScenes 数据集上进行的实验证明了所提模型的有效性与鲁棒性.

外文摘要：LiDAR is widely used to scan the surrounding environment,obtain measurement data,and construct a three-dimensional(3D)point cloud in vehicle environment perception tasks.However,it cannot perceive semantic information in the environment,which limits its effectiveness in 3D object detection.Consequently,in this study,we design a multi-modal fusion LiDAR-enhancement algorithm based on multiscale features and introduce some innovations under the Transformer framework to enhance the 3D object detection effect of LiDAR in complex environments.In the encoder,multiscale semantic features extracted by a semantic-aware aggregation module will be used for cross-modal feature fusion,whereas scale self-attention and proposal-guided initialization in the decoder will be used to make the prediction process more efficient.We also design a triangular loss function to improve the regression of the prediction box position,which restricts the regression position of the prediction box between 2D and 3D labels with triangular geometric constraints to obtain better prediction results.The experiments conducted on the nuScenes dataset have demonstrated the effectiveness and robustness of the proposed model.

外文关键词：

LiDARmultimodal fusionTransformer3D object detectionbird's-eye view

作者：

罗一凯、何林远、马时平

展开 >

作者单位：

空军工程大学航空工程学院,陕西西安 710038

关键词：

激光雷达多模态融合 Transformer 三维目标检测鸟瞰图

出版年：

2024

DOI：

10.3788/LOP240778

激光与光电子学进展

中国科学院上海光学精密机械研究所

激光与光电子学进展

CSTPCD北大核心

影响因子：1.153

ISSN：1006-4125

年,卷(期)：2024.61(18)

参考文献量3