Target tracking algorithm based on dynamic position encoding and attention enhancement
A method based on dynamic position encoding and multi-domain attention feature enhancement was proposed to fully exploit the positional information between the template and search region and harness the feature representation capabilities.Firstly,a position encoding module with convolutional operations was embedded within the attention module.Position encoding was updated with attention calculations to enhance the utilization of spatial structural information.Next,a multi-domain attention enhancement module was introduced.Sampling was conducted in the spatial dimension using parallel convolutions with different dilation rates and strides to cope with targets of different sizes and aggregate the enhanced channel attention features.Finally,a spatial domain attention enhancement module was incorporated into the decoder to provide accurate classification and regression features for the prediction head.The proposed algorithm achieved an average overlap (AO) of 73.9% on the GOT-10K dataset.It attained area under the curve (AUC) scores of 82.7%,69.3%,and 70.9% on the TrackingNet,UAV123,and OTB100 datasets,respectively.Comparative results with state-of-the-art algorithms demonstrated that the tracking model,which integrated dynamic position encoding as well as channel and spatial attention enhancement,effectively enhanced the interaction of information between the template and search region,leading to improved tracking accuracy.