首页|基于全局注意力的正交融合图像描述符

基于全局注意力的正交融合图像描述符

扫码查看
图像描述符是计算机视觉任务重要研究对象,被广泛应用于图像分类、分割、识别与检索等领域.深度图像描述符在局部特征提取分支缺少高维特征的空间与通道信息的关联性,导致局部特征表达的信息不充分.为此,提出一种融合局部、全局特征的图像描述符,在局部特征提取分支进行膨胀卷积提取多尺度特征图,输出的特征拼接后经过含有多层感知器的全局注意力机制捕捉具有关联性的通道-空间信息,再加工后输出最终的局部特征;高维的全局分支经过全局池化和全卷积生成全局特征向量;提取局部特征在全局特征向量上的正交值与全局特征串联后聚合形成最终的描述符.同时,在特征约束方面,使用包含子类心的角域度损失函数增大模型在大规模数据集的鲁棒性.在国际公开数据集Roxford5k和Rparis6k上进行实验,所提出描述符的平均检索精度在medium和hard模式分别为 81.87%和 59.74%以及 91.61%和 79.12%,比深度正交融合描述符分别提升了 1.70%,1.56%,2.00%和 1.83%,较其他图像描述符具有更好的检索精度.
Orthogonal fusion image descriptor based on global attention
Image descriptors are important research objects in computer vision tasks and are widely applied to the fields of image classification,segmentation,recognition,and retrieval.The depth image descriptor lacks the correlation between the high-dimensional feature space and channel information in the local feature extraction branch,resulting in insufficient information for local feature expression.Therefore,an image descriptor combining local and global features was proposed.The multi-scale feature map was extracted through dilated convolution in the local feature extraction branch.After the output features were spliced,the relevant channel-space information was captured through a global attention mechanism with a multilayer perceptron.Then the final local features were output after processing.The high-dimensional global branches generated global feature vectors through global pooling and full convolution.The orthogonal values of local features were extracted on the global feature vector,and were then concatenated with the global features to form the final descriptor.At the same time,the robustness of the model in large-scale datasets were enhanced by employing the angular domain loss function containing the sub-class center.The experimental results on the publicly available datasets Roxford5k and Rparis6k demonstrated that in medium and hard modes,the average retrieval accuracy of this descriptor reached 81.87%and 59.74%,and 91.61%and 79.12%,respectively.This represented an improvement of 1.70%and 1.56%,and 2.00%and 1.83%compared to that of deep orthogonal fusion descriptors.It exhibited superior retrieval accuracy over other image descriptors.

image descriptordilated convolutionglobal attentionfeature fusionsub-center arcface

艾列富、陶勇、蒋常玉

展开 >

安庆师范大学计算机与信息学院,安徽 安庆 246133

安徽三联学院智慧交通现代产业学院,安徽 合肥 230601

图像描述符 膨胀卷积 全局注意力 特征融合 子类心角度域损失

安徽省自然科学基金安徽省自然科学基金安徽省高等学校自然科学研究重点项目

1608085MF1441908085MF194KJ2020A0498

2024

图学学报
中国图学学会

图学学报

CSTPCD北大核心
影响因子:0.73
ISSN:2095-302X
年,卷(期):2024.45(3)
  • 38