首页|Multi-scale multi-modal fusion for object detection in autonomous driving based on selective kernel
Multi-scale multi-modal fusion for object detection in autonomous driving based on selective kernel
扫码查看
点击上方二维码区域,可以放大扫码查看
原文链接
NSTL
Elsevier
Fusion object detection using camera and LiDAR information in autonomous driving is still a challenging task, the difference between sensor data increases the difficulty of data fusion. To address this issue, we propose a multi-scale selective kernel fusion(MSSKF) method and demonstrate its practical utility by using LiDAR-camera fusion in object detection network. Specifically, a multi-scale feature fusion module that uses multi-scale convolution to separate the feature expression of multi-modal information and calculates the weight of each modal feature channel is proposed. We use the idea of multi-scale convolution and selection kernel to complete multi-modal fusion in object detection, which is conducive to solving the problem that the image and point cloud fusion are difficult to match due to the difference in data structure, and the complementarity of multi-modal information has been fully utilized. To verify the effectiveness of MSSKF, experiments on the KITTI object detection benchmark dataset are conducted. It has been observed that the proposed method achieves more accurate detection for pedestrians and vehicles, with a 1.6% gain in AP(50) compared to the values of the original fusion method, reaching a score of 90.1%, and the mAP reached 60.9%. Experiments show that the proposed method introduces a new optimization idea for multi-modal fusion in the field of autonomous driving object detection, and the fusion detection efficiency is at over 12 fps on a single GPU.