基于Transformer生成对抗网络的跨模态哈希检索算法

CROSS-MODAL HASH RETRIEVAL BASED ON TRANSFORMER GENERATIVE ADVERSARIAL NETWORKS

扫码查看

原文链接

维普
万方数据

中文摘要：考虑生成对抗网络在保持跨模态数据之间的流形结构的优势,并结合 Transformer 利用自注意力和无须使用卷积的优点,提出一种基于 Transformer生成对抗网络的跨模态哈希检索算法.首先在 ImageNet数据集上预训练Vision Transformer 框架,并将其作为图像特征提取的主干网络,然后将不同模态的数据分割为共享特征和私有特征.接着,构建对抗学习模块减少不同模态的共享特征的分布距离与保持语义一致性,同时增大不同模态的私有特征分布距离与保持语义非一致性.最后将通用的特征表示映射为紧凑的哈希码,实现跨模态哈希检索.实验结果表明,在公共数据集上,所提算法优于对比算法.

外文摘要：Considering the advantages of Generative Adversarial Networks in maintaining manifold structure among cross-modal data,and combining the advantages of self-attention in Transformer and no need to use convolution,a cross-modal hash method based on Transformer Generative Adversarial Network is proposed.Firstly,the Vision Transformer framework is pre-trained on ImageNet dataset and used as the backbone network for image feature extraction.Then,different modalities are segmented into shared features and pri-vate features.Next,an adversarial learning module is constructed to align the distribution and semantic consistency of shared features of different modalities while increasing the distribution and semantic inconsistency of private features of different modalities.Finally,the general feature representation is mapped into a compact hash code to achieve cross-modal hash retrieval.Experimental results show that the proposed algorithm outperforms the comparison algorithms on public datasets.

外文关键词：

Transformergenerative adversarial networkcross-modal retrievalhash codingsemantic preservation

作者：

雷蕾、徐黎明

展开 >

作者单位：

南阳理工学院计算机与软件学院河南南阳 473004

西华师范大学计算机学院四川南充 637002

关键词：

Transformer 生成对抗网络跨模态检索哈希编码语义保持

出版年：

2024

DOI：

10.16827/j.cnki.41-1404/z.2024.04.006

南阳理工学院学报

南阳理工学院

南阳理工学院学报

CHSSCD

影响因子：0.178

ISSN：1674-5132

年,卷(期)：2024.16(4)