Fusing global aggregation and local mining for architectural image retrieval
To address the problem of low retrieval accuracy in architectural image retrieval due to scale variations and lo-cal occlusions,this paper proposes an architectural image retrieval network that integrates global aggregation and lo-cal mining.The method introduces global branch for multi-scale feature aggregation and a local branch for attention-guided feature mining following the ResNet50 backbone network.The network efficiently integrates complementary features from the two branches through an orthogonal fusion module.Specifically,the multi-scale feature aggrega-tion module utilizes mixed dilated convolutions and channel attention to adaptively aggregate globally different-scale targets,enhancing the network's ability to extract multi-scale salient features from architectural images.The atten-tion-guided feature mining module employs information complementary attention to mark and erase the most salient feature,achieving the mining of potential detail information in local regions.The proposed method achieves mean average precision(mAP)metrics of 81.54%(M)and 62.43%(H)on the ROxf dataset,as well as 90.28%(M)and 78.35%(H)on the RPar dataset,which are two major mainstream architectural datasets.Experimental results indicate that the method effectively overcomes the interference of scale variations and local occlusions,significantly improving the accuracy of architectural image retrieval.