首页|基于Transformer生成对抗网络的跨模态哈希检索算法

基于Transformer生成对抗网络的跨模态哈希检索算法

扫码查看
考虑生成对抗网络在保持跨模态数据之间的流形结构的优势,并结合 Transformer 利用自注意力和无须使用卷积的优点,提出一种基于 Transformer生成对抗网络的跨模态哈希检索算法.首先在 ImageNet数据集上预训练Vision Transformer 框架,并将其作为图像特征提取的主干网络,然后将不同模态的数据分割为共享特征和私有特征.接着,构建对抗学习模块减少不同模态的共享特征的分布距离与保持语义一致性,同时增大不同模态的私有特征分布距离与保持语义非一致性.最后将通用的特征表示映射为紧凑的哈希码,实现跨模态哈希检索.实验结果表明,在公共数据集上,所提算法优于对比算法.
CROSS-MODAL HASH RETRIEVAL BASED ON TRANSFORMER GENERATIVE ADVERSARIAL NETWORKS
Considering the advantages of Generative Adversarial Networks in maintaining manifold structure among cross-modal data,and combining the advantages of self-attention in Transformer and no need to use convolution,a cross-modal hash method based on Transformer Generative Adversarial Network is proposed.Firstly,the Vision Transformer framework is pre-trained on ImageNet dataset and used as the backbone network for image feature extraction.Then,different modalities are segmented into shared features and pri-vate features.Next,an adversarial learning module is constructed to align the distribution and semantic consistency of shared features of different modalities while increasing the distribution and semantic inconsistency of private features of different modalities.Finally,the general feature representation is mapped into a compact hash code to achieve cross-modal hash retrieval.Experimental results show that the proposed algorithm outperforms the comparison algorithms on public datasets.

Transformergenerative adversarial networkcross-modal retrievalhash codingsemantic preservation

雷蕾、徐黎明

展开 >

南阳理工学院计算机与软件学院 河南 南阳 473004

西华师范大学计算机学院 四川 南充 637002

Transformer 生成对抗网络 跨模态检索 哈希编码 语义保持

2024

南阳理工学院学报
南阳理工学院

南阳理工学院学报

CHSSCD
影响因子:0.178
ISSN:1674-5132
年,卷(期):2024.16(4)