局部-全局特征引导的图文多级关系分析与挖掘方法

扫码查看

原文链接

国家科技期刊平台
NETL
NSTL
万方数据

中文摘要：具有语义相关性的文本、图像数据往往具有互补性,可以从不同角度增强语义理解,因此,图文语义关系挖掘是图文数据得以充分利用的关键.为解决图文数据深层语义关系挖掘不充分、检索阶段预测不精准的问题,本文提出了一种局部-全局特征引导的多级关系分析与挖掘方法.采用多头自注意力机制的Transformer建模图像关系,构建图像引导的文本注意力模块,挖掘图像区域和全局文本间的细粒度关系,融合局部-全局特征有效增强图文数据的语义关系.为验证本文方法,在Flickr30K、MSCOCO-1K和MSCOCO-3K数据集上进行实验,并与VSM、SGRAF等 13 种方法进行对比分析,本文方法中以文索图的召回率平均提升了 0.62%,以图索文的召回率平均提高了 0.5%,实验结果验证了本文方法的有效性.

外文标题：Analysis and mining method of multi-level relations between image and text guided by local-global features

外文摘要：Text and image data with semantic relevance can enhance semantic understanding from different perspectives due to their complementarity.Therefore,the key to make full use of image and text data lies in the mining of semantic relations between image and text.In order to solve the problems of insufficient mining of deep semantic relations of image and text data and inaccurate prediction in retrieval stage,an analysis and mining method of multi-level relations between image and text guided by local-global features is proposed in this paper.Transformer with multi-head self-attention mechanism is used to model image relations.By constructing an image-guided text attention module,the fine-grained relationship between image region and global text is explored.Furthermore,the local and global features are fused to effectively enhance the semantic relationship between image and text data.To verify the proposed method,the experiments were carried out on the data sets of Flickr30K,MSCOCO-1K and MSCOCO-3K.Compared with 12 other methods such as VSM and SGRAF,the recall rate of searching for image by text in this method has increased by 0.62%on average,and the recall rate of searching for text by image has increased by 0.5%on average.The experimental results well verify the effectiveness of this method.

外文关键词：

image and text relation miningmulti-headed self-attention mechanismlocal-global features

作者：

王海荣、郭瑞萍、徐玺、周北京

展开 >

作者单位：

北方民族大学计算机科学与工程学院,宁夏银川 750021

北方民族大学图像图形智能处理国家民委重点实验室,宁夏银川 750021

关键词：

图文关系挖掘多头自注意力机制局部-全局特征

基金：

宁夏回族自治区教育厅高等学校科学研究重点项目宁夏自然科学基金资助项目

项目编号：

NYG20220512023AAC03316

出版年：

2024

DOI：

10.3969/j.issn.1007-791X.2024.05.007

燕山大学学报

燕山大学

燕山大学学报

CSTPCD北大核心

影响因子：0.298

ISSN：1007-791X

年,卷(期)：2024.48(5)

参考文献量1