基于多分支特征融合的车载激光雷达3D目标检测算法

Automotive LiDAR 3D object detection algorithm based on multibranch feature fusion

金伟正 ¹孙原 ¹李方玉¹

扫码查看

作者信息

1. 武汉大学电子信息学院,湖北武汉 430072
折叠

摘要

该文基于多分支特征融合的3D目标检测算法将无序的点云划分为规则的体素,利用体素特征编码模块和卷积神经网络学习体素特征,再将稀疏的3D数据压缩为稠密的二维鸟瞰图,最后通过2D骨干网络的粗糙分支和精细分支对多尺度鸟瞰图特征进行深度融合.该文实现了对多尺度特征的语义信息、纹理信息和上下文信息的聚合,得到了更加精确的原始空间位置信息、物体分类、位置回归和朝向预测,在KITTI数据集上取得优异的平均精度,并在保持一定帧率的同时具有较强的稳健性.

Abstract

[Objective]With the rapid popularization of new energy vehicles and the vigorous development of autonomous driving technology,3D object detection algorithms play a pivotal role in real road scenes.The LiDAR point clouds contain precise position and geometric structure information of the object,which can accurately describe the 3D space position of the target.Moreover,LiDAR makes environmental perception and route planning of unmanned vehicles a reality.However,cars in real scenes often fall into complex and difficult situations,such as occlusion and truncation of objects,which contribute to highly sparse clouds and incomplete contours.Therefore,the effective use of disordered and unevenly distributed point clouds for accurate 3D object detection has important research significance and practical value for the safety of autonomous driving.[Methods]This paper uses LiDAR point clouds in an autonomous driving scene to conduct in-depth research on a high-performance 3D object detection algorithm based on deep learning.The 3D object detection algorithm based on multibranch feature fusion(PSANet)is designed to improve the capacity and viability of autonomous driving technology.After the disordered point clouds are divided into regular voxels,the voxel feature coding module and convolutional neural network are used to learn the voxel features,and the sparse 3D data are compressed into a dense 2D bird's-eye view.Furthermore,the multiscale bird's-eye view features are deeply fused through the coarse and fine branches of the 2D backbone network.The splitting and aggregation feature pyramid module in the fine branch splits and aggregates the bird's-eye view features at different levels and realizes the deep fusion of semantic information,texture information,and context information of multiscale features to obtain more expressive features.The multiscale features in the coarse branch are fused after transposed convolution,and the precise original spatial location information is preserved.After the feature extraction of coarse and fine branches,the element-wise addition method can obtain more accurate and complete features for object classification,position regression,and orientation prediction.[Results]The experimental results on the KITTI dataset show that the average precision of PSANet in 3D object detection and bird's-eye view object detection tasks reach 81.72％and 88.25％,respectively.The inference speed on a single GTX 1080Ti GPU can reach 24 frames per second,and it shows strong robustness in complex scenes.Compared with the two-stage target detection algorithms MV3D,AVOD-FPN,F-PointNet,and IPOD,the average accuracy of 3D target detection of this algorithm increased by 18.21％,5.89％,8.94％,and 3.12％,respectively.Compared with the one-stage target detection algorithms VoxelNet,SECOND,PointPillars,and VoTr-SSD,the average accuracy of 3D target detection of this algorithm increased by 11.63％,4.05％,3.19％,0.98％,1.16％,and 0.7％,respectively.The detection speed of this algorithm improved by 14 frames per second compared with the two-stage algorithm PointRCNN with similar accuracy.[Conclusions]In comparison with other advanced algorithms,this algorithm exhibits strong performance,in which it can better balance the accuracy and speed of target detection in autonomous driving scenarios.Higher accuracy is required in the algorithm,whether one-or two-stage,for novel applications.The method with the highest efficiency must be used in an autonomous car.

关键词

激光雷达点云/3D目标检测/感受域/特征融合

Key words

LiDAR point cloud/3D object detection/receptor domain/feature fusion

引用本文复制引用

基金项目

国家重点研发计划(2018YFB1201602-05)

出版年

2024

实验技术与管理

清华大学

实验技术与管理

CSTPCD北大核心

影响因子：1.651

ISSN：1002-4956

参考文献量18

段落导航