Transformer-based Object Tracking Algorithm with Feature Fusion
In recent years,object tracking networks based on deep learning have made significant advancements.These networks prima-rily employ two types of frameworks:the dual-stream dual-stage framework and the single-stream single-stage framework.However,the former overlooks information interaction during the feature extraction process,while the latter inherits limitations from the back-bone network itself.Therefore,this paper utilizes an independent backbone network to directly construct the tracker and designs a light-weight multi-scale feature fusion architecture to enhance the network's ability to perceive multi-scale information with lower computa-tional overhead.It also incorporates the recursive gated convolution as feature learning units to enable deep feature mining through a-daptive high-order spatial interactions.Additionally,this paper utilizes the DropMAE pre-trained model for network initialization,there-by enhancing its generalization capability.Experimental results demonstrate that the proposed object tracking network consistently ex-hibits significantly superior tracking performance across multiple large-scale tracking benchmark datasets and can achieve real-time tracking at a speed of 78.4 FPS.