首页|一种基于特征融合的Transformer目标跟踪算法

一种基于特征融合的Transformer目标跟踪算法

扫码查看
近年来,基于深度学习的目标跟踪网络取得了显著的进展.这些网络主要采用两种类型的框架:双流双阶段框架和单流单阶段框架.然而,前者忽视了在特征提取过程中的信息交互,后者则受限于骨干网络自身的局限性.因此,本文采用独立骨干网络来直接构建跟踪器,并设计了一种轻量化的多尺度特征融合架构,以较低的计算成本增强了网络对多尺度信息的感知能力;同时,引入递归门控卷积作为特征学习单元,以自适应高阶空间交互实现了网络对特征的深层挖掘;此外,本文使用Drop-MAE预训练模型来进行网络初始化,以提升网络的泛化能力.实验结果表明,所提出的目标跟踪网络在多个大型跟踪数据集基准上都表现出优异的性能,并能以78.4 FPS的速度进行实时跟踪.
Transformer-based Object Tracking Algorithm with Feature Fusion
In recent years,object tracking networks based on deep learning have made significant advancements.These networks prima-rily employ two types of frameworks:the dual-stream dual-stage framework and the single-stream single-stage framework.However,the former overlooks information interaction during the feature extraction process,while the latter inherits limitations from the back-bone network itself.Therefore,this paper utilizes an independent backbone network to directly construct the tracker and designs a light-weight multi-scale feature fusion architecture to enhance the network's ability to perceive multi-scale information with lower computa-tional overhead.It also incorporates the recursive gated convolution as feature learning units to enable deep feature mining through a-daptive high-order spatial interactions.Additionally,this paper utilizes the DropMAE pre-trained model for network initialization,there-by enhancing its generalization capability.Experimental results demonstrate that the proposed object tracking network consistently ex-hibits significantly superior tracking performance across multiple large-scale tracking benchmark datasets and can achieve real-time tracking at a speed of 78.4 FPS.

visual object trackingsingle-stream single-stage frameworkmulti-scale feature fusionrecursive gated convolutionnet-work initialization

管旭、胡春燕、李菲菲

展开 >

上海理工大学光电信息与计算机工程学院,上海 200093

视觉目标跟踪 单流单阶段框架 多尺度特征融合 递归门控卷积 网络初始化

2025

小型微型计算机系统
中国科学院沈阳计算技术研究所

小型微型计算机系统

北大核心
影响因子:0.564
ISSN:1000-1220
年,卷(期):2025.46(1)