Combining Cross-layer Feature Fusion Network and Non-local Recognition Method
In the methods previously used for image recognition,the primary objective is to extract a large number of local key features with distinctiveness.Due to the nature of fine-grained image classification,which is the subdivision of specific subcatego-ries under a broader category,there is only a minimal difference between various fine categories.To enhance accuracy,it becomes crucial to identify local regions with significant discriminative capabilities.In their work,the authors employed a target-navigation network to select the top k most discriminative local region blocks.Following this selection,they leveraged the concept of non-local modules to capture the interconnections among these different local regions,thus utilizing image information more comprehensively to improve precision.Concurrently,in the residual networks,convolutional attention modules are utilized to seize the interrelations among different channel attention features.Moreover,at the final fully connected layer,they innovated the network architecture by implementing cross-layer feature fusion instead of simple cascading.Considering the extensive human and material resources re-quired for labeling images in fine-grained recognition,the authors propose a self-supervised method.