Generative adversarial network based two-stage generation of high-quality images from text
A generative adversarial network with deep fusion attention(DFA-GAN)was proposed,using multiple loss functions as constraints,to address the issues of poor image quality and inconsistency between text descriptions and generated images in traditional text-to-image generation methods.A two-stage image generation process was employed with a single-level generative adversarial network(GAN)as the backbone.An initial blurry image which was generated in the first stage was fed into the second stage,and high-quality image regeneration was achieved to enhance the overall image generation quality.During the first stage,a visual-text fusion module was designed to deeply integrate text features and image features,and text information was adequately fused during the image sampling process at different scales.In the second stage,an image generator with an improved Vision Transformer as the encoder was proposed to fully fuse image features with text description word features.Quantitative and qualitative experimental results showed that the proposed method outperformed other mainstream models in terms of image quality improvement and alignment with text descriptions.