Cross-modal learning is one of the medium and long-term research topics in the field of artificial intelligence.Image generation based on text descriptions has become a hot research field in recent years.The main task is to generate images that are highly correlated with text based on text descriptions.This pa-per summarizes the research status and latest progress in the field of text-to-image generation.From the gen-eration framework,the generation model is divided into generative adversarial network framework method and non-generative adversarial network method.According to the training strategy,the generative adversari-al network framework method is subdivided into single-stage,multi-stage,and additional supervision cate-gories,while introducing some classic non-generative adversarial network methods.Finally,the data set and evaluation standard used in the text generation image task are given,the shortcomings and unsolved problems of the current method are proposed,and the future research methods are pointed out.
关键词
文本到图像生成/生成对抗网络/扩散模型/单阶段生成/多阶段生成
Key words
text to image generation/generative adversarial networks/diffusion models/single-stage gen-eration/multi-stage generation