首页|基于自监督增强特征的直推式零样本图像分类

基于自监督增强特征的直推式零样本图像分类

扫码查看
图像的视觉特征对实现零样本图像分类有至关重要的作用。尽管目前VGG、GoogLeNet和ResNet等网络提取的深度特征在图像分类领域获得了广泛的应用,但其在零样本图像分类问题上的表现并不理想,仍旧存在较大的提升空间。此外,由于零样本学习场景下训练集与测试集不相交的设定,导致分类网络不可避免地存在领域偏移问题。为此,提出一种基于自监督增强特征的直推式零样本图像分类框架。首先,通过辅助任务构造伪标签,利用自监督学习获得图像的自监督特征并将其与无监督深度特征进行特征融合;然后,将融合特征嵌入语义空间中进行零样本图像分类,并获得未见类的初始预测标签;最后,利用未见类特征和预测标签迭代地优化视觉-语义映射。所提出框架组件可选择,框架组件自监督网络、主干网络和降维网络分别选用CFN、VGG16和PCA构成网络。在CUB、SUN和AwA2数据集上的实验结果表明,所提出网络能够增强特征的判别能力,在零样本图像分类问题上表现良好。
Transductive zero-shot image classification based on self-supervised enhancement feature
The visual features of images play a crucial role in realizing zero-shot image classification.Although the deep features extracted by networks such as VGG,GoogLeNet,and ResNet have been widely used in the field of image classification,their performance in zero-shot image classification is not ideal.In addition,due to the disjoint setting of the training and testing sets under the zero-shot learning scenario,the classification network inevitably suffers from the problem of domain shift.Therefor,a transductive zero-shot image classification framework based on self-supervised enhancement feature is proposed.The main idea is as follows:first,the pseudo-labels are constructed via the auxiliary task,the self-supervised features of images are obtained by using the self-supervised learning and are further fused with the unsupervised deep features;then,the fused features are embedded in the semantic space for zero-shot image classification,thus the initial predicted labels for unseen classes are obtained;finally,the features and predicted labels of unseen classes are adopted to iteratively optimize the visual-semantic mapping.The framework components proposed can be selected.The framework components self-supervised network,backbone network and reduced-dimension network are CFN,VGG16 and PCA respectively.Experiments on CUB,SUN,and AwA2 datasets show that the proposed network can enhance the discriminative capability of features and perform well on zero-shot image classification tasks.

zero-shot learningself-supervised learningtransductivevisual-semantic mappingfeature fusionimage classification

王浩宇、张欣然、王雪松、程玉虎

展开 >

中国矿业大学信息与控制工程学院,江苏徐州 221116

零样本学习 自监督学习 直推式 视觉-语义映射 特征融合 图像分类

国家自然科学基金国家自然科学基金江苏省自然科学基金江苏省卓越博士后计划

6217625961976215BK202211162022ZB530

2024

控制与决策
东北大学

控制与决策

CSTPCD北大核心
影响因子:1.227
ISSN:1001-0920
年,卷(期):2024.39(5)
  • 23