Cross-modal learning is one of the medium and long-term research topics in the field of artificial intelligence.Image generation based on text descriptions has become a hot research field in recent years.The main task is to generate images that are highly correlated with text based on text descriptions.This pa-per summarizes the research status and latest progress in the field of text-to-image generation.From the gen-eration framework,the generation model is divided into generative adversarial network framework method and non-generative adversarial network method.According to the training strategy,the generative adversari-al network framework method is subdivided into single-stage,multi-stage,and additional supervision cate-gories,while introducing some classic non-generative adversarial network methods.Finally,the data set and evaluation standard used in the text generation image task are given,the shortcomings and unsolved problems of the current method are proposed,and the future research methods are pointed out.
text to image generationgenerative adversarial networksdiffusion modelssingle-stage gen-erationmulti-stage generation