Fine-granularity Text-Guided Cross-modality Style Transfer
By utilizing the disentanglement representation of StyleGANs and the semantic correspondence between different modalities in multimodal pre-trained model,existing methods have achieved good results in cross-modality style transfer.However,the latent space of StyleGANs based on image scale decomposition is not conducive to edi-ting local attributes,which can cause interference with irrelevant parts during transferring.We propose fine-granu-larity text guided cross-modality style transfer model,achieving locally controllable style transfer by utilizing region-al information in prompt text.Firstly,a text semantic classification network based on BERT is used to locate the se-mantic regions contained in the target style text.Then a feature mapping network is used to embed the CLIP fea-tures of the target text into the latent space of SemanticStyleGAN.The combination of text semantic classification network and feature mapping network enables fine-granularity embedding of CLIP features of the target text into ed-itable potential spaces.Finally,the adversarial generation problem during training is solved by randomly augmenting the generated stylized images through perspective views.The experiment shows that the method proposed in this pa-per can generate images that are more closely related to the prompt text style and improve the regional accuracy of cross-modality editing.