Orthogonal fusion image descriptor based on global attention
Image descriptors are important research objects in computer vision tasks and are widely applied to the fields of image classification,segmentation,recognition,and retrieval.The depth image descriptor lacks the correlation between the high-dimensional feature space and channel information in the local feature extraction branch,resulting in insufficient information for local feature expression.Therefore,an image descriptor combining local and global features was proposed.The multi-scale feature map was extracted through dilated convolution in the local feature extraction branch.After the output features were spliced,the relevant channel-space information was captured through a global attention mechanism with a multilayer perceptron.Then the final local features were output after processing.The high-dimensional global branches generated global feature vectors through global pooling and full convolution.The orthogonal values of local features were extracted on the global feature vector,and were then concatenated with the global features to form the final descriptor.At the same time,the robustness of the model in large-scale datasets were enhanced by employing the angular domain loss function containing the sub-class center.The experimental results on the publicly available datasets Roxford5k and Rparis6k demonstrated that in medium and hard modes,the average retrieval accuracy of this descriptor reached 81.87%and 59.74%,and 91.61%and 79.12%,respectively.This represented an improvement of 1.70%and 1.56%,and 2.00%and 1.83%compared to that of deep orthogonal fusion descriptors.It exhibited superior retrieval accuracy over other image descriptors.