Multi-Region Attention Network for Fine-Grained Image Classification
The current challenge in fine-grained image classification lies in accurately identifying highly recognizable local areas and auxiliary distinguishing features within an image.To address this issue,a multi-region attention network for fine-grained image classification is proposed.The process begins with the use of Inception-V3 for feature extraction.The model is then directed to focus on secondary features via repeated application of attention erasure.Subsequently,more precise local images are generated by removing the background and employing up-sampling techniques.This is followed by the analysis of the position statistics of the extracted local features.The entire image is then represented as a rectangular box,minimizing the loss of detailed information.Further detailed learning is conducted on local and overall images.Additionally,a joint loss function is designed to enhance the model's recognition capabilities.This is realized by dynamically balancing between difficult and easy samples and reducing intra-class variance.Experimental results on the public fine-grained image datasets CUB-200-2011,Stanford-Cars,and FGVC-Aircraft demonstrate that this method can realize accuracies of 89.2%,94.8%,and 94.0%,respectively.These figures surpass those achieved by other methods.