A comprehensive evaluation and categorization of text-to-image generation tasks were conducted.Text-to-image generation tasks were classified into three major categories based on the principles of image generation:text-to-image generation based on the generative adversarial network architecture,text-to-image generation based on the autoregressive model architecture,and text-to-image generation based on the diffusion model architecture.Improvements in different aspects were categorized into six subcategories for text-to-image generation methods based on the generative adversarial network architecture:adoption of multi-level hierarchical architectures,application of attention mechanisms,utilization of siamese networks,incorporation of cycle-consistency methods,deep fusion of text features,and enhancement of unconditional models.The general evaluation indicators and datasets of existing text-to-image methods were summarized and discussed through the analysis of different methods.
AI-generated contenttext-to-imagegenerative adversarial networkautoregressive modeldiffu-sion model