3D Object Detection Based on Voxel Self-Attention Auxiliary Networks
A voxel self-attention auxiliary(VSAA)network is proposed to address the issue of poor detection performance in LiDAR object detection algorithms for autonomous driving scenes.This issue stems from a lack of deep understanding of the spatial structure,owing to its reliance on a convolutional neural network(CNN).VSAA network can be directly applied to most voxel-based target detection algorithms to enhance its feature extraction capabilities.First,the VSAA network enhances the efficiency of searching relevant voxels in subsequent self-attention calculations by further constructing voxel hash tables for secondary encoding,based on the foundation of voxel feature encoding.Second,VSAA network applies the self-attention mechanism at the voxel level to capture comprehensive global information and profound contextual semantic information.Finally,this study proposes the VA-SECOND and VA-PVRCNN algorithms by applying VSAA network to the benchmark algorithms SECOND and PV-RCNN,respectively.The features of VSAA network and CNN are fused to compensate for the disadvantage of the small receptive field of the CNN,thus enhancing the detection ability of the algorithm and allowing it to understand an entire spatial scene.Experimental results obtained using the KITTI dataset show that,compared with the benchmark algorithms,VA-SECOND and VA-PVRCNN algorithms improve the average detection accuracy of all detected targets by 1.16 percentage point and 1.54 percentage point,respectively,which proves the effectiveness of the VSAA network.