CROSS-MODAL HASH RETRIEVAL BASED ON TRANSFORMER GENERATIVE ADVERSARIAL NETWORKS
Considering the advantages of Generative Adversarial Networks in maintaining manifold structure among cross-modal data,and combining the advantages of self-attention in Transformer and no need to use convolution,a cross-modal hash method based on Transformer Generative Adversarial Network is proposed.Firstly,the Vision Transformer framework is pre-trained on ImageNet dataset and used as the backbone network for image feature extraction.Then,different modalities are segmented into shared features and pri-vate features.Next,an adversarial learning module is constructed to align the distribution and semantic consistency of shared features of different modalities while increasing the distribution and semantic inconsistency of private features of different modalities.Finally,the general feature representation is mapped into a compact hash code to achieve cross-modal hash retrieval.Experimental results show that the proposed algorithm outperforms the comparison algorithms on public datasets.