文本到图像生成方法的研究进展

Advances of Text-to-Image generation method

王鹏¹

扫码查看

作者信息

1. 南京信息工程大学人工智能学院(未来技术学院),南京 210044
折叠

摘要

跨模态学习是人工智能领域中长期研究的课题之一,依据文本描述生成图像成为近几年的热门研究领域,主要任务是根据文本描述生成和文本高度相关性的图像.文中总结了文本到图像生成领域中的研究现状和最新进展,从生成框架上将生成模型分为生成对抗网络框架方法和非生成对抗网络方法,又根据训练策略将生成对抗网络框架方法细分为单阶段、多阶段和额外监督等类别,同时介绍了经典的一些非生成对抗网络方法.最后给出文本生成图像任务采用的数据集和评估标准,提出了当前方法的不足和尚未解决的问题,指出了未来的研究方法.

Abstract

Cross-modal learning is one of the medium and long-term research topics in the field of artificial intelligence.Image generation based on text descriptions has become a hot research field in recent years.The main task is to generate images that are highly correlated with text based on text descriptions.This pa-per summarizes the research status and latest progress in the field of text-to-image generation.From the gen-eration framework,the generation model is divided into generative adversarial network framework method and non-generative adversarial network method.According to the training strategy,the generative adversari-al network framework method is subdivided into single-stage,multi-stage,and additional supervision cate-gories,while introducing some classic non-generative adversarial network methods.Finally,the data set and evaluation standard used in the text generation image task are given,the shortcomings and unsolved problems of the current method are proposed,and the future research methods are pointed out.

关键词

文本到图像生成/生成对抗网络/扩散模型/单阶段生成/多阶段生成

Key words

text to image generation/generative adversarial networks/diffusion models/single-stage gen-eration/multi-stage generation

引用本文复制引用

出版年

2024

信息技术

黑龙江省信息技术学会中国电子信息产业发展研究院　中国信息产业部电子信息中心

信息技术

CSTPCD

影响因子：0.413

ISSN：1009-2552

段落导航