RGBT Tracking via a Multi-Layer Attention Mechanism
Extracting the complementary information of infrared and visible light data can effectively improve the robust-ness of visual tracking in complex environments. However, in the process of feature extraction, most of these methods only ex-tract single-modal features independently, ignoring the important role of multi-layer feature modeling in accurately locating the target position. Aiming at the above problems, this paper proposes an RGBT tracking based on a multi-layer attention mechanism. Firstly, the depth features of two modalities are extracted from the input backbone network of multi-modal imag-es, and at the same time, the modality attention module is introduced into each layer of feature extraction to filter inaccurate multi-modal information, thus realizing effective multi-level and multi-modal feature modeling. In addition, to suppress the noise and redundant information in multi-modal fusion features, a modality fusion module is proposed to further realize the adaptive fusion of multi-modal features and obtain more discriminating multi-modal features. Experiments on two public data-sets show that the proposed method generates higher tracking accuracy and speed.