首页|基于双层路由注意力及特征融合的细粒度图像分类

基于双层路由注意力及特征融合的细粒度图像分类

扫码查看
近年来,视觉Transformer(Vision Transformer,ViT)在图像识别领域取得了突破性进展,其自注意力机制能够从图像中提取出不同像素块的判别性标记信息,进而提升图像分类的精度。在图像分类领域中,细粒度图像分类具有类与类之间的特征差距小、类内的特征差距大的特点,从而导致了分类困难。针对细粒度图像分类中数据分布具有小型、非均匀和难以发现类与类之间的差异等特征,提出一种基于双层路由注意力(Bi-level Routing Attention,BRA)的细粒度图像分类模型。基准骨干网络采用多阶段层级架构设计的新型视觉Transformer模型作为视觉特征提取器,从中获得局部信息和全局信息以及多尺度的特征。同时引入特征增强、融合模块,以此提高网络对关键特征的学习能力。实验结果表明,该模型在CUB-200-2011 和Stanford Dogs这两个细粒度图像数据集上的分类精度分别达到了91。7%和92。2%,相较于多个主流细粒度图像分类模型,该模型具有更好的分类结果。
Fine-grained Image Classification Based on Bi-level Routing Attention and Feature Fusion
In recent years,Vision Transformer(ViT)has made a breakthrough in the field of image recognition.Its self-attention module can extract discriminative labeling information of different pixel blocks from images,thereby improving the accuracy of image classification.In the field of image classification,fine-grained image classification is difficult to classify due to the characteristics of small feature differences between classes and large feature differences within classes.A fine-grained image classification model based on Bi-level Routing Attention(BRA)is proposed to address the characteristics of small,non-uniform,and imperceptible differences between classes in data distribution in fine-grained image classification.The benchmark backbone network adopts a new visual Transformer model designed with a multi-stage hierarchical architecture as the visual feature extractor,which obtains local and global information as well as multi-scale features.At the same time,feature boosting and fusion modules are introduced to improve the network's learning ability for key features.The experimental results show that the classification accuracy of such model on two fine-grained image datasets,CUB-200-2011 and Stanford Dogs,reaches91.7% and 92.2% .Compared with multiple mainstream fine-grained image classification models,such model has better classification results.

fine-grained image classificationneural networkvision Transformerattention mechanismfeature fusion

沈宇麒、崔衍

展开 >

南京邮电大学 物联网学院,江苏 南京 210003

细粒度图像分类 神经网络 视觉Transformer 注意力机制 特征融合

中国博士后科学基金

2020M671554

2024

计算机技术与发展
陕西省计算机学会

计算机技术与发展

CSTPCD
影响因子:0.621
ISSN:1673-629X
年,卷(期):2024.34(6)
  • 30