为进一步提升跟踪算法在历史帧信息利用和目标特征表达方面的性能,提出基于特征增强和历史帧选择的 Transformer 视觉跟踪算法(feature enhancement and history frame selection based Transformer visual tracking,FEHST)。首先,在骨干网络中引入动态预测模块,通过稀疏化策略提高自注意力机制的计算效率,聚焦目标区域特征;其次,提出特征增强模块,将局部信息与全局信息的优势相结合,提升特征的表达能力;最后,采用自适应历史帧选择策略,提升跟踪器对目标动态信息的关注。在LaSOT、TrackingNet、GOT-10K和OTB100等数据集上进行了大量的实验,实验结果显示,在LaSOT、TrackingNet、OTB100上分别取得70。1%、83。0%和71。6%的成功率,在GOT-10K上取得71。4%的平均重叠度,并能以27FPS的速度运行。
Feature enhancement and history frame selection based Transformer visual tracking
To enhance the performance of tracking algorithms in utilizing historical frame information and articulating target features,this paper proposes the feature enhancement and history frame selection based Transformer visual tracking(FEHST)algorithm.Firstly,a dynamic prediction module is integrated into the backbone network with a sparsification strategy to enhance the self-attention mechanism's computational efficiency,focusing on the target region's features.Then,a feature enhancement module is introduced,merging local and global information to improve feature representation.Finally,an adaptive history frame selection strategy is adopted to enhance focus on target dynamics and algorithm robustness.Experiments on LaSOT,TrackingNet,GOT-1 0K,and OTB100 datasets are carried out to validate the algorithm,showing success rates of 70.1%,83.0%,and 71.6%,and a 71.4%average overlap on GOT-10K,at 27 FPS.