Multi-scale multi-modal fusion for object detection in autonomous driving based on selective kernel

扫码查看

原文链接

NSTL
Elsevier

外文摘要：Fusion object detection using camera and LiDAR information in autonomous driving is still a challenging task, the difference between sensor data increases the difficulty of data fusion. To address this issue, we propose a multi-scale selective kernel fusion(MSSKF) method and demonstrate its practical utility by using LiDAR-camera fusion in object detection network. Specifically, a multi-scale feature fusion module that uses multi-scale convolution to separate the feature expression of multi-modal information and calculates the weight of each modal feature channel is proposed. We use the idea of multi-scale convolution and selection kernel to complete multi-modal fusion in object detection, which is conducive to solving the problem that the image and point cloud fusion are difficult to match due to the difference in data structure, and the complementarity of multi-modal information has been fully utilized. To verify the effectiveness of MSSKF, experiments on the KITTI object detection benchmark dataset are conducted. It has been observed that the proposed method achieves more accurate detection for pedestrians and vehicles, with a 1.6% gain in AP(50) compared to the values of the original fusion method, reaching a score of 90.1%, and the mAP reached 60.9%. Experiments show that the proposed method introduces a new optimization idea for multi-modal fusion in the field of autonomous driving object detection, and the fusion detection efficiency is at over 12 fps on a single GPU.

外文关键词：

Multi-modal FusionMulti-scaleChannel attentionAutonomous drivingObject detection

作者：

Gao, Xin、Zhang, Guoying、Xiong, Yijin

展开 >

作者单位：

China Univ Min & Technol Beijing

出版年：

2022

DOI：

10.1016/j.measurement.2022.111001

Measurement

SCI

ISSN：0263-2241

年,卷(期)：2022.194

被引量8
参考文献量43