一种融合CNN与Transformer的高鲁棒性目标跟踪算法
A Highly Robust Target Tracking Algorithm Merging CNN and Transformer
刘沛津 1付雪峰 1孙浩峰 2何林 3刘淑婕1
作者信息
- 1. 西安建筑科技大学机电工程学院,陕西西安 710055
- 2. 中国地质大学机械与电子信息学院,湖北武汉 430074
- 3. 西安建筑科技大学理学院,陕西西安 710055
- 折叠
摘要
针对因目标物体形变、尺度变化、快速运动和遮挡等导致目标跟踪算法性能下降的问题,基于孪生网络架构提出了一种融合CNN与Transformer的高鲁棒性目标跟踪算法.在特征提取阶段,使用标准卷积提取浅层局部特征信息,在深层网络中设计了一种类卷积Transformer模块建模全局信息,并采用滑窗方式计算Transformer 中的像素值,大大降低了计算量.在特征聚合阶段,采用多头交叉注意力模块构建特征增强与聚合网络,滤除干扰信息,突出与模板相关的信息以提高特征的判别性.与目前的主流算法相比,所提算法在OTB2015数据集上的形变、尺度变化、快速运动和遮挡4种不同挑战下的评估指标均为最优.在GOT-10K数据集上的平均重叠度为70.8%,相比TransT和SiamR-CNN算法分别提高3.7%和5.9%.在LaSOT、UAV123数据集上成功率分别为67.7%、71.9%,相比TransT和SiamR-CNN算法分别提高2.8%、2.8%和2.9%、7%.在VOT2018和VOT2019数据集上的鲁棒性评估结果,所提算法跟踪失败次数最少,鲁棒性指标得分分别为0.112和0.266,相比Ocean算法分别提高0.5%和5%,进一步验证了所提算法具有更高鲁棒性.
Abstract
To address the performance degradation of target tracking algorithms caused by target object deformation,scale variation,fast motion,and occlusion,a highly robust target tracking algorithm that Merging a CNN and Transformer is proposed based on siamese network architecture.In the feature extraction stage,standard convolutions are employed to extract shallow local feature information,while a convolution-like Transformer module is designed in the deep netvork to model global information.The pixel values in the Transformer are computed using a sliding window significantly reducing computational complexity.In the feature aggregation stage,a multi-head cross-attention module is utilized to construct a network for feature enhancement and aggregation.This module filters outirrelevant information and highlights the template-related information to improve the discriminative power of the features.Compared with the current mainstream algorithms,the proposed algorithm is optimal in terms of evaluation metrics under four different challenges of deformation,scale variation,fast motion and occlusion,on OTB2015 dataset.The average overlap(AO)on GOT-10K dataset is 70.8%,which is an improvement of 3.7%and 5.9%compared to the TransT and SiamR-CNN algorithms,respectively.The success rate on LaSOT and UAV123 dataset is 67.7%and 71.9%,which is improved by 2.8%,2.8%and 2.9%and 7%compared to TransT and SiamR-CNN algorithms,respectively.The robustness(R)evaluation results on the VOT2018 and VOT2019 datasets show that the proposed algorithm achieved the least tracking failures rate,with robustness(R)index scores of 0.112 and 0.266,respectively,which are improved by 0.5%and 5%compared to Ocean algorithm,further verifies the higher robustness of the proposed algorithm.
关键词
目标跟踪/孪生网络/Transformer/多头交叉注意力机制/高鲁棒性Key words
object tracking/siamese network/Transformer/multi-head cross-attention mechanism/high robustness引用本文复制引用
基金项目
陕西省重点研发计划(2022GY-134)
陕西省教育厅专项科研(21JK0732)
西安建筑科技大学自然科学专项项目(ZR19058)
出版年
2024