Combining Gated Mechanism and Multi-Scale ViT for Fine-Grained Image Classification
In order to solve the problem that among the existing fine-grained image classification methods,the feature extraction ability of convolution neural network is insufficient,the high-level features of the model lack local details to identify the subtle differences of different subclasses,and the inductive bias ability of ViT is weak,this paper proposes a hybrid model(GCAMT)combining ConvNets and Attention Multiscale ViT for fine-grained image classification tasks.First,the gating mechanism is used to selectively extract and fuse the discriminant features,then the feature reactivation module is introduced to update the potential redundant features to improve the model feature reuse efficiency,and then intensively each stage of the model is connected to improve the generalization ability,and fi-nally,the attentional multi-scale ViT is used to better extract and fuse features that are inconsistent in scale and se-mantics,enhancing the modeling ability of the model.The experimental results show that the accuracy of the method in this paper reaches 93.1%,96.29%,94.47% and 93.82% respectively on the fine grain image public dataset CUB-200-2011,Stanford Cards,FGVC Aircraft and NABirds,which is superior to the current SOTA method.