基于CLIP和双空间自适应归一化的图像翻译

扫码查看

原文链接

国家科技期刊平台
NETL
NSTL
万方数据

中文摘要：现有的图像翻译方法大多依赖数据集域标签来完成翻译任务,这种依赖往往限制了它们的应用范围.针对完全无监督图像翻译任务的方法能够解决域标签的限制问题,但是普遍存在源域信息丢失的现象.为了解决上述2个问题,提出一种基于对比学习语言-图像预训练(CLIP)的无监督图像翻译模型.首先,引入CLIP相似性损失对图像的风格特征施加约束,以在不使用数据集域标签的情况下增强模型传递图像风格信息的能力和准确性;其次,对自适应实例归一化(AdaIN)进行改进,设计一个新的双空间自适应归一化(DSAdaIN)模块,在特征的风格化阶段添加网络的学习和自适应交互过程,以加强对内容源域信息的保留;最后,设计一个鉴别器对比损失来平衡对抗网络损失的训练和优化过程.在多个公开数据集上的实验结果表明,与StarGANv2、StyleDIS等模型相比,该模型可在准确传递图像风格信息的同时保留一定的源域信息,且在定量评估指标FID分数和KID分数上分别提升了近3.35和0.57×102,实现了较好的图像翻译性能.

外文标题：Image-to-Image Translation Based on CLIP and Dual-Spatially Adaptive Normalization

外文摘要：Most existing image-to-image translation methods rely on dataset domain labels,which often limits their application.Although the current methods for truly unsupervised image-to-image translation tasks can address the limitations of domain labels,the loss of source-domain information remains widespread.To address these two problems simultaneously,an unsupervised image-to-image translation model based on Contrastive Language-Image Pre-training(CLIP)is proposed.First,constraints are placed on style features by introducing CLIP similarity loss to enhance the ability and accuracy of the model to convey image-style information without using dataset domain labels.Next,by improving the Adaptive Instance Normalization(AdaIN),a new Dual-Spatially Adaptive Instance Normalization(DSAdaIN)module is designed to add the learning and adaptive interaction processes of the network in the stylized stage of features to enhance the retention of content source domain information.Finally,the training and optimization processes for the adversarial network loss are balanced by designing a discriminator contrastive loss.Experimental results on multiple public datasets demonstrate that the proposed model can accurately transfer the image style information while retaining certain source domain information compared with other models such as StarGANv2 and StyleDIS,and it has improved the quantitative evaluation metrics Fréchet Inception Distance(FID)and Kernel Inception Distance(KID)scores by approximately 3.35 and 0.57 X 102 orders of magnitude,respectively,successfully achieving a good image-to-image translation performance.

外文关键词：

image-to-image translationGenerative Adversarial Networks(GAN)Contrastive Language-Image Pre-training(CLIP)modelAdaptive Instance Normalization(AdaIN)contrastive learning

作者：

李田芳、普园媛、赵征鹏、徐丹、钱文华

展开 >

作者单位：

云南大学信息学院,云南昆明 650504

云南省高校物联网技术及应用重点实验室,云南昆明 650500

关键词：

图像翻译生成对抗网络对比学习语言-图像预训练模型自适应实例归一化对比学习

基金：

国家自然科学基金国家自然科学基金国家自然科学基金国家自然科学基金国家自然科学基金国家自然科学基金云南省科技厅项目云南省科技厅项目云南省科技厅应用基础研究计划重点项目云南省科技厅应用基础研究计划重点项目云南省重大科技专项计划项目云南省中青年学术技术带头人后备人才

项目编号：

611630196127136161761046U180227161662087620610492014FA0212018FB100202001BB0500432019FA044202002AD0800012019HB121

出版年：

2024

DOI：

10.19678/j.issn.1000-3428.0067524

计算机工程

华东计算技术研究所　上海市计算机学会

计算机工程

CSTPCD北大核心

影响因子：0.581

ISSN：1000-3428

年,卷(期)：2024.50(5)

参考文献量30