首页|基于图卷积与多头注意力的图文跨模态检索

基于图卷积与多头注意力的图文跨模态检索

扫码查看
针对现有跨模态检索方法难以衡量各节点数据权重和模态内局部一致性的问题,提出一种基于多头注意力机制的图文跨模态检索方法。首先在构建模态图时,将单个图文样本作为独立节点,采用图卷积提取各样本间的交互信息,提高不同模态数据内的局部一致性;然后在图卷积中引入注意力机制,自适应学习各个邻居节点的权重系数,从而区分不同邻居节点对中心节点的影响力;最后构建带有权重参数的多头注意力层,充分学习节点间的多组相关特征。与现有8种方法相比,该方法在Wikipedia数据集和Pascal Sentence数据集上进行实验得到的mAP值,分别提升了 2。6%-42。5%和 3。3%-54。3%。
Cross-modal image and text retrieval based on graph convolution and multi-head attention
Aiming at the problem that the existing cross-modal retrieval methods are difficult to measure the weight of data at each node,and there are limitations in mining local consistency within modalities,a cross-modal image and text retrieval method based on multi-head attention mechanism is proposed.Firstly,a single image and text sample serves as an independent node when constructing the modal diagram,and graph convolution is used to extract the interaction information between each sample to improve the local consistency in different modal data.Then,attention mechanism is introduced into graph convolution to adaptively learn the weight coefficients of each neighboring node,thereby distinguishing the influence of different neighboring nodes on the central node.Finally,a multi-head attention layer with weight parameters is constructed to fully learn multiple sets of related features between nodes.Compared with the existing 8 methods,the mAP values obtained by this method in experiments on the Wikipedia dataset and Pascal Sentence dataset increase by 2.6%to 42.5%and 3.3%to 54.3%,respectively.

attention weightadjacency matrixmulti-head attentioncommon subspacecross-modal retrieval

化春键、张宏图、蒋毅、俞建峰、陈莹

展开 >

江南大学机械工程学院,江苏无锡 214122

江苏省食品先进制造装备技术重点实验室,江苏无锡 214122

江南大学物联网工程学院,江苏无锡 214122

注意力权重 邻接矩阵 多头注意力 公共子空间 跨模态检索

国家自然科学基金

62173160

2024

光电子·激光
天津理工大学 中国光学学会

光电子·激光

北大核心
影响因子:1.437
ISSN:1005-0086
年,卷(期):2024.35(9)