Image-radar fusion algorithm for 3D object detection
To address the challenges of data consistency and validity in the fusion of multimodal infor-mation within three-dimensional space,the proposed coding module BEVIRF for image-radar fusion in bird's eye view(BEV).Compared with the traditional scheme that lacks depth information in perspec-tive view,this method uses an improved scheme of variable attention to aggregate image and radar information,solves the problem of the unified representation of different modal information,and gener-ates semantically rich BEV feature maps that contain spatial location information BEV feature map.Meanwhile,the dynamic position encoding is proposed in the Transformer-based network structure,which aims to generate the corresponding position encoding by sensing the spatial information of the object,so that the model can focus on the target regression and reduce the instability of querying and matching.The proposed methods achieve impressive results on the nuScenes dataset compared to the state-of-the-art.
three dimension object detectionconvolutional neural networkattention mechanismbird's eye view features