动态多视图推理分层相似性的图文检索算法

Image-text Retrieval Algorithm of Dynamic Multi-view Reasoning Hierarchical Similarity

扫码查看

原文链接

国家科技期刊平台
NETL
NSTL
万方数据

中文摘要：跨模态图像文本检索通常指的是可见光图像和正常文本.其中,基于标量的图文相似度具有局限性,无法全面表示跨模态对齐.同时,局部区域—单词相关性和全局图像—文本依赖性之间存在复杂的相互作用,所以用于推理两种模态特征的模块存在一定程度的不确定性.针对上述问题,文章提出了一种基于层次相似网络的图文匹配动态多视图推理方法.首先,该方法使用了基于标量和基于向量的全局和局部相似度.其次,设计了四种类型的单元作为探索全局—局部相似性交互的基本单位.最后,引入了可学习的选择置信度机制,在Flickr30K和MSCOCO数据集上的实验展现了算法的卓越性能.

外文摘要：Cross-modal image-text retrieval usually refers to visible light images and normal text.Among them,image-text similarity based on scalar has limitations and cannot fully represent cross-modal alignment.At the same time,there is a complex interaction between local region—word correlation and global image—text dependence,so the modules used to infer the two modal features have a certain degree of uncertainty.In view of the above problems,this paper proposes a dynamic multi-view reasoning method of image-text matching based on hierarchical similarity network.Firstly,the method uses global and local similarity based on scalar and vector.Secondly,four types of units are designed as the basic units to explore the global—local similarity interaction.Finally,a learnable selection confidence mechanism is introduced,and experiments on Flickr30K and MSCOCO data set show the excellent performance of the algorithm.

外文关键词：

cross-modal retrievalimage-text matchingdynamic inter-action algorithmsimilarity prediction

作者：

张书铭

展开 >

作者单位：

江南大学人工智能与计算机学院,江苏无锡 214122

关键词：

跨模态检索图文匹配动态交互算法相似度预测

出版年：

2024

DOI：

10.19850/j.cnki.2096-4706.2024.17.011

现代信息科技

广东省电子学会

现代信息科技

ISSN：2096-4706

年,卷(期)：2024.8(17)