红外与可见光(RGB and Thermal,RGBT)目标跟踪得益于可见光与热红外2种模态数据的互补优势能够很好地提升跟踪器在部分极端环境下的目标定位能力.现有工作主要集中于如何对2种模态的特征进行提取和融合,忽略了不同模态中分层深度特征的潜在价值,这些分层深度特征对目标的定位与分类有着重要的作用.为此,提出了一种多层次特征交互的多模态自适应融合目标跟踪算法(Multi-layer Feature Interaction and Modal-adaptation Fusion Network,MIMFNet),通过特征提取器和注意力机制对分层特征进行提取与自适应校准;分层特征聚合子网将不同层的特征进行自上而下相互聚合,使低层特征不仅保留了自身的空间细节也获取了高层特征的语义信息.设计了一种多模态信息传递模块对2种模态的分层信息进行自适应融合,使模型聚焦到质量更高的特征通道上.通过多个公开数据集上的大量实验结果表明,提出的多模态目标跟踪算法具有优良的抗干扰特性,特别是由于尺度变化(Scale Variation,SV)、热交叉(Thermal Crossover,TC)和遮挡(Occlusion,OCC)等因素引起的跟踪漂移得到了显著优化.
Abstract
RGBT object tracking benefits from the complementary advantages of visible RGB and thermal infrared modalities,which can effectively enhance the object localization capability of trackers in challenging environmental conditions.Existing works mainly focus on how to extract and fuse features from these two modalities,while neglecting the potential value of hierarchical deep features within each modality,which play crucial roles in object localization and classification.To address this problem,the Multi-layer Feature Interaction and Modal-adaptation Fusion Network(MIMFNet)is proposed to achieve the RGB and thermal tracking.Firstly,the algorithm extracts and adaptively calibrates hierarchical features through feature extractors and attention mechanisms.Secondly,a hierarchical feature aggregation sub-network combines features from different layers in a top-down fashion,allowing low-level features to retain their spatial details while capturing semantic information from high-level features.Finally,a multi-modal information propagation module is designed to adaptively fuse hierarchical information from both modalities to direct the model's focus towards higher-quality feature channels.Extensive experimental results on multiple publicly available datasets demonstrate the strong anti-jamming properties of the proposed RGBT tracking algorithm.In particular,significant improvements have been achieved in dealing with tracking drifts caused by factors such as Scale Variation(SV),Thermal Crossover(TC)and Occlusion(OCC).