RGB-T object tracking network based on multi-scale modality fusion
RGB-T(RGB-Thermal)object tracking has received much attention in the field of object tracking because it is less restricted by lighting conditions.An RGB-T object tracking network was proposed to address the differences in resolution and semantic information of features at different scales,the inconsistency between visible and thermal infrared modal information,and the shortcomings of existing networks in multimodal fusion strategies.The network adopted Siamese structure to expand the template image features and search image features output by the backbone feature extraction network from single scale to multiple scales.The modal fusion for visible and thermal infrared modalities at different scales was performed separately.Then the obtained fused features were enhanced by the attention mechanism to enhance the feature representation.Finally,the prediction results were obtained by the region suggestion network.The experimental results on two publicly available RGB-T datasets,GTOT and RGBT-234,show that the network,with high tracking precision and success rate,can cope with complex tracking scenarios and has higher tracking performance compared with other networks.