首页|动态多视图推理分层相似性的图文检索算法

动态多视图推理分层相似性的图文检索算法

扫码查看
跨模态图像文本检索通常指的是可见光图像和正常文本.其中,基于标量的图文相似度具有局限性,无法全面表示跨模态对齐.同时,局部区域—单词相关性和全局图像—文本依赖性之间存在复杂的相互作用,所以用于推理两种模态特征的模块存在一定程度的不确定性.针对上述问题,文章提出了一种基于层次相似网络的图文匹配动态多视图推理方法.首先,该方法使用了基于标量和基于向量的全局和局部相似度.其次,设计了四种类型的单元作为探索全局—局部相似性交互的基本单位.最后,引入了可学习的选择置信度机制,在Flickr30K和MSCOCO数据集上的实验展现了算法的卓越性能.
Image-text Retrieval Algorithm of Dynamic Multi-view Reasoning Hierarchical Similarity
Cross-modal image-text retrieval usually refers to visible light images and normal text.Among them,image-text similarity based on scalar has limitations and cannot fully represent cross-modal alignment.At the same time,there is a complex interaction between local region—word correlation and global image—text dependence,so the modules used to infer the two modal features have a certain degree of uncertainty.In view of the above problems,this paper proposes a dynamic multi-view reasoning method of image-text matching based on hierarchical similarity network.Firstly,the method uses global and local similarity based on scalar and vector.Secondly,four types of units are designed as the basic units to explore the global—local similarity interaction.Finally,a learnable selection confidence mechanism is introduced,and experiments on Flickr30K and MSCOCO data set show the excellent performance of the algorithm.

cross-modal retrievalimage-text matchingdynamic inter-action algorithmsimilarity prediction

张书铭

展开 >

江南大学 人工智能与计算机学院,江苏 无锡 214122

跨模态检索 图文匹配 动态交互算法 相似度预测

2024

现代信息科技
广东省电子学会

现代信息科技

ISSN:2096-4706
年,卷(期):2024.8(17)