Global Cross-layer Interaction Networks Learning Fine-grained Images Features Representation
The key task of fine-grained visual categorization is to extract highly discriminative features.In previous models,bi-linear pooling techniques and their variants are often combined to solve this problem.However,most bilinear pooling and its vari-ants ignore intra-layer or inter-layer feature interactions,and such insufficient interactions can easily lead to the loss of discrimi-native information or make the discriminative information contain too much redundant information.Aiming at the above prob-lems,a new method for learning fine-grained image features and feature representations—Global Cross-layer Interaction(GCI)network is designed.The proposed hierarchical bicubic pooling method balances the ability of extracting discriminative informa-tion and filtering redundant information and can simultaneously model the feature interaction within and between layers.The in-teractive computing structure is combined with the existing channel attention mechanism to form an interactive attention mecha-nism to improve the key feature extraction capability of the backbone network.Finally,the feature extraction network composed of interactive attention mechanism is fused with bicubic pooling method to obtain GCI,and robust fine-grained image feature rep-resentation is extracted.Experiments are carried out on three fine-grained benchmark datasets,and the experimental results show that the hierarchical bicubic pooling achieves the best results in the hierarchical interactive pooling framework,namely the classification accuracy of CUB-200-2011,Stanford-Cars and FGVC-Aircraft is 87.4%,93.2%and 92.1%,respectively,and the classification accuracy is further improved to 88.5%,95.1%and 93.9%after the interactive attention mechanism is integrated.