Transductive zero-shot image classification based on self-supervised enhancement feature
The visual features of images play a crucial role in realizing zero-shot image classification.Although the deep features extracted by networks such as VGG,GoogLeNet,and ResNet have been widely used in the field of image classification,their performance in zero-shot image classification is not ideal.In addition,due to the disjoint setting of the training and testing sets under the zero-shot learning scenario,the classification network inevitably suffers from the problem of domain shift.Therefor,a transductive zero-shot image classification framework based on self-supervised enhancement feature is proposed.The main idea is as follows:first,the pseudo-labels are constructed via the auxiliary task,the self-supervised features of images are obtained by using the self-supervised learning and are further fused with the unsupervised deep features;then,the fused features are embedded in the semantic space for zero-shot image classification,thus the initial predicted labels for unseen classes are obtained;finally,the features and predicted labels of unseen classes are adopted to iteratively optimize the visual-semantic mapping.The framework components proposed can be selected.The framework components self-supervised network,backbone network and reduced-dimension network are CFN,VGG16 and PCA respectively.Experiments on CUB,SUN,and AwA2 datasets show that the proposed network can enhance the discriminative capability of features and perform well on zero-shot image classification tasks.