Deep Hashing for Fine-Grained Image Retrieval Combining Global and Local Features
The existing deep hashing methods for fine-grained image retrieval often focus on extracting high-resolution features while neglecting other valuable regions and failing to consider the interaction between global and local features.This results in insufficient feature extraction,loss of local information,and feature redundancy.In this paper,the Conformer model is introduced and improved.Firstly,a feature refinement module is introduced to weaken the features of high-response regions,allowing the model to pay attention to the remaining feature information and obtain complete image features.Secondly,the convolutional blocks of the CNN branch in the Conformer model is replaced with a multi-scale dilated convolution module,enabling interaction between global and local features through a network with stronger feature extraction capabilities,thereby extracting richer image features.Finally,a multi-branch attention module with diversity loss is introduced to obtain attention features of different dimensions,adaptively filtering more fine-grained local features while removing background interference and suppressing redundant features.Experimental results show that the proposed method achieves classification accuracies of 65.73%,81.32%,76.45%and 68.41%respectively using 16-bit hash codes on four fine-grained image datasets:CUB-200-2011,FGVC-Aircraft,Stanford Cars and Stanford Dogs,outperforming existing deep hashing methods for fine-grained image retrieval and yielding better retrieval results.