首页|基于混合注意力的Transformer视觉目标跟踪算法

基于混合注意力的Transformer视觉目标跟踪算法

扫码查看
基于Transformer的视觉目标跟踪算法能够很好地捕获目标的全局信息,但是,在对目标特征的表述上还有进一步提升的空间。为了更好地提升对目标特征的表达能力,提出一种基于混合注意力的Transformer视觉目标跟踪算法。首先,引入混合注意力模块捕捉目标在空间和通道维度中的特征,实现对目标特征上下文依赖关系的建模;然后,通过多个不同空洞率的平行空洞卷积对特征图进行采样,以获得图像的多尺度特征,增强局部特征表达能力;最后,在Transformer编码器中加入所构建的卷积位置编码层,为跟踪器提供精确且长度自适应的位置编码,提升跟踪定位的精度。在OTB 100、VOT2018和LaSOT等数据集上进行大量实验,实验结果表明,通过基于混合注意力的Transformer网络学习特征间的关系,能够更好地表示目标特征。与其他主流目标跟踪算法相比,所提出算法具有更好的跟踪性能,且能够达到26帧/s的实时跟踪速度。
Transformer visual object tracking algorithm based on mixed attention
The Transformer-based visual object tracking algorithm can capture the global information of the target well,but there is a possibility of further improvement in the presentation of the object features.To better improve the expression ability of object features,a Transformer visual object tracking algorithm based on mixed attention is proposed.First,the mixed attention module is introduced to capture the features of the object in the spatial and channel dimensions,so as to model the contextual dependencies of the target features.Second,the feature maps are sampled by multiple parallel dilated convolutions with different dilation rates to obtain the multi-scale features of the images,and enhance the local feature representation.Finally,the convolutional position encoding constructed is added to the Transformer encoder to provide accurate and length-adaptive position coding for the tracker,thereby improving the accuracy of tracking and positioning.The experimental results of the proposed algorithm on OTB 100,VOT 2018 and LaSOT show that by learning the relationship between features through the Transformer network based on mixed attention,the object features can be better represented.Compared with other mainstream object tracking algorithms,the proposed algorithm has better tracking performance and achieves a real-time tracking speed of 26 frames per second.

computer visionobject trackingsiamese networkdeep learningattention mechanismTransformer

侯志强、郭凡、杨晓麟、马素刚、范九伦

展开 >

西安邮电大学计算机学院,西安 710121

西安邮电大学通信与信息工程学院,西安 710121

计算机视觉 目标跟踪 孪生网络 深度学习 注意力机制 Transformer

国家自然科学基金

62072370

2024

控制与决策
东北大学

控制与决策

CSTPCD北大核心
影响因子:1.227
ISSN:1001-0920
年,卷(期):2024.39(3)
  • 16