细粒度文本引导的跨模态风格迁移

Fine-granularity Text-Guided Cross-modality Style Transfer

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：借助于StyleGANs的解纠缠表示和多模态预训练模型中不同模态之间的语义对应关系,现有方法在跨模态风格迁移领域取得了较好的结果.然而,基于图像尺度分解的StyleGANs的潜在空间不利于局部属性的编辑,这会造成在迁移时对无关部分的干扰.该文提出细粒度文本引导的跨模态风格迁移模型,通过利用文本中包含的区域信息来实现局部可控的风格迁移.首先,通过基于BERT的文本语义分类网络对目标风格文本包含的语义区域进行定位,然后利用特征映射网络将目标文本的CLIP特征嵌入到SemanticStyleGAN的潜在空间.文本语义分类网络和特征映射网络的结合使得目标文本的CLIP特征细粒度地嵌入到可编辑的潜在空间.最后通过对生成的风格化图像进行随机透视增强来解决训练中的对抗生成问题.实验表明,该方法能够生成更贴近文本描述风格的图像,并提高了跨模态编辑的区域准确性.

外文摘要：By utilizing the disentanglement representation of StyleGANs and the semantic correspondence between different modalities in multimodal pre-trained model,existing methods have achieved good results in cross-modality style transfer.However,the latent space of StyleGANs based on image scale decomposition is not conducive to edi-ting local attributes,which can cause interference with irrelevant parts during transferring.We propose fine-granu-larity text guided cross-modality style transfer model,achieving locally controllable style transfer by utilizing region-al information in prompt text.Firstly,a text semantic classification network based on BERT is used to locate the se-mantic regions contained in the target style text.Then a feature mapping network is used to embed the CLIP fea-tures of the target text into the latent space of SemanticStyleGAN.The combination of text semantic classification network and feature mapping network enables fine-granularity embedding of CLIP features of the target text into ed-itable potential spaces.Finally,the adversarial generation problem during training is solved by randomly augmenting the generated stylized images through perspective views.The experiment shows that the method proposed in this pa-per can generate images that are more closely related to the prompt text style and improve the regional accuracy of cross-modality editing.

外文关键词：

style transfermulti-modal pre-trained modeltext semantic classification

作者：

孙世昶、魏爽、孟佳娜、林鸿飞、肖文浩、刘爽

展开 >

作者单位：

大连民族大学计算机科学与工程学院,辽宁大连 116600

大连理工大学计算机科学与技术学院,辽宁大连 116024

关键词：

风格迁移多模态预训练模型文本语义分类

出版年：

2024

中文信息学报

中国中文信息学会,中国科学院软件研究所

中文信息学报

CSTPCDCHSSCD北大核心

影响因子：0.8

ISSN：1003-0077

年,卷(期)：2024.38(12)