基于全局注意力的正交融合图像描述符

Orthogonal fusion image descriptor based on global attention

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：图像描述符是计算机视觉任务重要研究对象,被广泛应用于图像分类、分割、识别与检索等领域.深度图像描述符在局部特征提取分支缺少高维特征的空间与通道信息的关联性,导致局部特征表达的信息不充分.为此,提出一种融合局部、全局特征的图像描述符,在局部特征提取分支进行膨胀卷积提取多尺度特征图,输出的特征拼接后经过含有多层感知器的全局注意力机制捕捉具有关联性的通道-空间信息,再加工后输出最终的局部特征;高维的全局分支经过全局池化和全卷积生成全局特征向量;提取局部特征在全局特征向量上的正交值与全局特征串联后聚合形成最终的描述符.同时,在特征约束方面,使用包含子类心的角域度损失函数增大模型在大规模数据集的鲁棒性.在国际公开数据集Roxford5k和Rparis6k上进行实验,所提出描述符的平均检索精度在medium和hard模式分别为 81.87%和 59.74%以及 91.61%和 79.12%,比深度正交融合描述符分别提升了 1.70%,1.56%,2.00%和 1.83%,较其他图像描述符具有更好的检索精度.

外文摘要：Image descriptors are important research objects in computer vision tasks and are widely applied to the fields of image classification,segmentation,recognition,and retrieval.The depth image descriptor lacks the correlation between the high-dimensional feature space and channel information in the local feature extraction branch,resulting in insufficient information for local feature expression.Therefore,an image descriptor combining local and global features was proposed.The multi-scale feature map was extracted through dilated convolution in the local feature extraction branch.After the output features were spliced,the relevant channel-space information was captured through a global attention mechanism with a multilayer perceptron.Then the final local features were output after processing.The high-dimensional global branches generated global feature vectors through global pooling and full convolution.The orthogonal values of local features were extracted on the global feature vector,and were then concatenated with the global features to form the final descriptor.At the same time,the robustness of the model in large-scale datasets were enhanced by employing the angular domain loss function containing the sub-class center.The experimental results on the publicly available datasets Roxford5k and Rparis6k demonstrated that in medium and hard modes,the average retrieval accuracy of this descriptor reached 81.87%and 59.74%,and 91.61%and 79.12%,respectively.This represented an improvement of 1.70%and 1.56%,and 2.00%and 1.83%compared to that of deep orthogonal fusion descriptors.It exhibited superior retrieval accuracy over other image descriptors.

外文关键词：

image descriptordilated convolutionglobal attentionfeature fusionsub-center arcface

作者：

艾列富、陶勇、蒋常玉

展开 >

作者单位：

安庆师范大学计算机与信息学院,安徽安庆 246133

安徽三联学院智慧交通现代产业学院,安徽合肥 230601

关键词：

图像描述符膨胀卷积全局注意力特征融合子类心角度域损失

基金：

安徽省自然科学基金安徽省自然科学基金安徽省高等学校自然科学研究重点项目

项目编号：

1608085MF1441908085MF194KJ2020A0498

出版年：

2024

DOI：

10.11996/JG.j.2095-302X.2024030472

图学学报

中国图学学会

图学学报

CSTPCD北大核心

影响因子：0.73

ISSN：2095-302X

年,卷(期)：2024.45(3)

参考文献量38