首页|结合门控机制与多尺度ViT的细粒度图像分类

结合门控机制与多尺度ViT的细粒度图像分类

扫码查看
针对现有细粒度图像分类方法中,卷积神经网络特征提取能力不足,模型高层特征缺乏局部细节信息来识别不同子类的细微差异,而ViT的归纳偏置能力偏弱的问题,提出了一种结合ConvNets与注意力多尺度ViT的混合模型(GCAMT)用于细粒度图像分类任务。首先,使用门控机制选择性提取与融合判别性特征,其次引入特征重激活模块更新有潜力的冗余特征提升模型特征复用效率,然后对模型各阶段进行密集连接提升泛化能力,最后使用注意力多尺度ViT,以更好的提取并融合尺度与语义不一致的特征,增强模型建模能力。实验结果表明,上述方法在细粒度图像公共数据集CUB-200-2011、Stan-ford Cars、FGVC-Aircraft和NABirds上的准确率分别达到了 93。1%、96。29%、94。47%、93。82%,优于当前SOTA方法。
Combining Gated Mechanism and Multi-Scale ViT for Fine-Grained Image Classification
In order to solve the problem that among the existing fine-grained image classification methods,the feature extraction ability of convolution neural network is insufficient,the high-level features of the model lack local details to identify the subtle differences of different subclasses,and the inductive bias ability of ViT is weak,this paper proposes a hybrid model(GCAMT)combining ConvNets and Attention Multiscale ViT for fine-grained image classification tasks.First,the gating mechanism is used to selectively extract and fuse the discriminant features,then the feature reactivation module is introduced to update the potential redundant features to improve the model feature reuse efficiency,and then intensively each stage of the model is connected to improve the generalization ability,and fi-nally,the attentional multi-scale ViT is used to better extract and fuse features that are inconsistent in scale and se-mantics,enhancing the modeling ability of the model.The experimental results show that the accuracy of the method in this paper reaches 93.1%,96.29%,94.47% and 93.82% respectively on the fine grain image public dataset CUB-200-2011,Stanford Cards,FGVC Aircraft and NABirds,which is superior to the current SOTA method.

Fine-grained image classificationGated feature fusionFeature reactivationMulti-scaleSelfattention

姜苏城、王红林

展开 >

南京信息工程大学人工智能学院,江苏 南京 210044

细粒度图像分类 门控特征融合 特征重激活 多尺度 自注意力

国家自然科学基金委员会青年项目

62101275

2024

计算机仿真
中国航天科工集团公司第十七研究所

计算机仿真

CSTPCD
影响因子:0.518
ISSN:1006-9348
年,卷(期):2024.41(9)
  • 1