A comprehensive evaluation and categorization of text-to-image generation tasks were conducted.Text-to-image generation tasks were classified into three major categories based on the principles of image generation:text-to-image generation based on the generative adversarial network architecture,text-to-image generation based on the autoregressive model architecture,and text-to-image generation based on the diffusion model architecture.Improvements in different aspects were categorized into six subcategories for text-to-image generation methods based on the generative adversarial network architecture:adoption of multi-level hierarchical architectures,application of attention mechanisms,utilization of siamese networks,incorporation of cycle-consistency methods,deep fusion of text features,and enhancement of unconditional models.The general evaluation indicators and datasets of existing text-to-image methods were summarized and discussed through the analysis of different methods.
关键词
人工智能生成内容/文本生成图像/生成对抗网络/自回归模型/扩散模型
Key words
AI-generated content/text-to-image/generative adversarial network/autoregressive model/diffu-sion model