文本生成图像研究综述

Survey of text-to-image synthesis

曹寅 ¹秦俊平 ¹马千里 ¹孙昊 ¹闫凯 ¹王磊 ¹任家琪¹

扫码查看

作者信息

1. 内蒙古工业大学数据科学与应用学院,内蒙古呼和浩特 010000;内蒙古自治区基于大数据的软件服务工程技术研究中心,内蒙古呼和浩特 010000
折叠

摘要

对文本生成图像任务进行综合评估和整理,根据生成图像的理念,将文本生成图像任务分为3大类:基于生成对抗网络架构生成图像、基于自回归模型架构生成图像、基于扩散模型架构生成图像.针对基于生成对抗网络架构的文本生成图像方法,按照改进的不同技术点归纳为6小类:采用多层次体系嵌套架构、注意力机制的应用、应用孪生网络、采用循环一致方法、深度融合文本特征和改进无条件模型.通过对不同方法的分析,总结并讨论了现有的文本生成图像方法通用评估指标和数据集.

Abstract

A comprehensive evaluation and categorization of text-to-image generation tasks were conducted.Text-to-image generation tasks were classified into three major categories based on the principles of image generation:text-to-image generation based on the generative adversarial network architecture,text-to-image generation based on the autoregressive model architecture,and text-to-image generation based on the diffusion model architecture.Improvements in different aspects were categorized into six subcategories for text-to-image generation methods based on the generative adversarial network architecture:adoption of multi-level hierarchical architectures,application of attention mechanisms,utilization of siamese networks,incorporation of cycle-consistency methods,deep fusion of text features,and enhancement of unconditional models.The general evaluation indicators and datasets of existing text-to-image methods were summarized and discussed through the analysis of different methods.

关键词

人工智能生成内容/文本生成图像/生成对抗网络/自回归模型/扩散模型

Key words

AI-generated content/text-to-image/generative adversarial network/autoregressive model/diffu-sion model

引用本文复制引用

基金项目

国家自然科学基金(61962044)

内蒙古自治区自然科学基金(2019MS06005)

内蒙古自治区科技重大专项(2021ZD0015)

自治区直属高校基本科研业务费专项(JY20220327)

出版年

2024

浙江大学学报(工学版)

浙江大学

浙江大学学报(工学版)

CSTPCDCSCD北大核心

影响因子：0.625

ISSN：1008-973X

被引量1

参考文献量92

段落导航