首页|小数据集上基于语义的局部注意视觉Transformer方法

小数据集上基于语义的局部注意视觉Transformer方法

扫码查看
在小数据集上从零开始训练时,视觉Transformer无法与同规模的卷积神经网络媲美.基于图像的局部注意力方法,可以显著提高ViT的数据效率,但是会丢失距离较远但相关的补丁之间的信息.为了解决上述问题,提出一种双向并行局部注意力视觉Transformer的方法.该方法首先在特征层面上对补丁进行分组,在组内执行局部注意力,以利用特征空间中补丁之间的关系弥补信息丢失.其次,为了有效融合补丁之间的信息,将基于语义的局部注意力和基于图像的局部注意力并行结合起来,通过双向自适应学习来增强ViT模型在小数据上的性能.实验结果表明,该方法在计算量为15.2 GFLOPs和参数量为57.2 M的情况下,分别在CIFAR-10和CIFAR-100数据集上实现了 97.93%和85.80%的准确性.相比于其他方法,双向并行局部注意力视觉Transformer在增强局部引导能力的同时,保持了局部注意力所需属性的有效性.
Semantics-based local attention visual Transformer method on small datasets
When training from scratch on a small data set,visual Transformer cannot be compared with convolutional neural networks of the same scale.Image-based local attention methods can significantly improve the data efficiency of ViT,but will lose information between distant but related patches.To solve the above problems,this paper proposed a bidirectional parallel local attention visual Transformer method.The method first grouped patches at the feature level and performed local attention within the grouped to compensate for the information loss by exploiting the relationships between patches in the feature space.Secondly,in order to effectively fuse information between patches,it combined semantic-based local attention and image-based local attention in parallel to enhance the performance of the ViT model on small data through bidirectional adaptive learning.Experimental results show that this method achieves 97.93%and 85.80%accuracy on the CIFAR-10 and CIFAR-100 data sets respectively with a calculation amount of 15.2 GFLOPs and a parameter amount of 57.2 M.Compared with other methods,the bidirectional parallel local attention visual Transformer maintains the effectiveness of the attributes required for local attention while enhancing local guidance capabi-lities.

deep learningimage classificationTransformerlocal attentionsemantics-based local attention

冯欣、王俊杰、钟声、方婷婷

展开 >

重庆理工大学计算机科学与工程学院,重庆 400054

西南大学人工智能学院,重庆 400715

深度学习 图像分类 Transformer 局部注意力 基于语义的局部注意

2025

计算机应用研究
四川省电子计算机应用研究中心

计算机应用研究

北大核心
影响因子:0.93
ISSN:1001-3695
年,卷(期):2025.42(1)