首页|文本到视频生成:研究现状、进展和挑战

文本到视频生成:研究现状、进展和挑战

扫码查看
文本到视频生成旨在根据用户给定的文本描述生成语义一致、内容真实、时序连贯且符合逻辑的视频.该文首先介绍了文本到视频生成领域的研究现状,详细介绍了3类主流的文本到视频生成方法:基于循环网络与生成对抗网络(GAN)的生成方法,基于Transformer的生成方法和基于扩散模型的生成方法.这3类生成方法在视频生成任务上各有优劣:基于循环网络与生成对抗网络的生成方法能生成较高分辨率和时长的视频,但难以生成复杂的开放域视频;基于Transformer的生成方法有能力生成复杂的开放域视频,但受限于Transformer模型单向偏置、累计误差等问题,难以生成高保真视频;扩散模型具有很好的泛化性,但受制于推理速度和高昂的内存消耗,难以生成高清的长视频.然后,该文介绍了文本到视频生成领域的评测基准和指标,并分析比较了现有主流方法的性能.最后,展望了未来可能的研究方向.
Text-to-video Generation:Research Status,Progress and Challenges
The generation of video from text aims to produce semantically consistent,photo-realistic,temporal consistent,and logically coherent videos based on provided textual descriptions.Firstly,the current state of research in the field of text-to-video generation is elucidated in this paper,providing a detailed overview of three mainstream approaches:methods based on recurrent networks and Generative Adversarial Networks(GAN),methods based on Transformers,and methods based on diffusion models.Each of these models has its strengths and weaknesses in video generation.The recurrent networks and GAN-based methods can generate videos with higher resolution and duration but struggle with generating complex open-domain videos.Transformer-based methods show proficiency in generating open-domain videos but face challenges related to unidirectional biases and accumulated errors,making it difficult to produce high-fidelity videos.Diffusion models exhibit good generalization but are constrained by inference speed and high memory consumption,making it challenging to generate high-definition and lengthy videos.Subsequently,evaluation benchmarks and metrics in the text-to-video generation domain are explored,and the performance of existing methods is compared.Finally,potential future research directions in the field is outlined.

Text-to-video generationDiffusion modelGenerative Adversarial Network(GAN)

邓梓焌、何相腾、彭宇新

展开 >

北京大学王选计算机研究所 北京 100080

文本到视频生成 扩散模型 生成对抗网络

国家自然科学基金国家自然科学基金国家自然科学基金

619252016213200162272013

2024

电子与信息学报
中国科学院电子学研究所 国家自然科学基金委员会信息科学部

电子与信息学报

CSTPCD北大核心
影响因子:1.302
ISSN:1009-5896
年,卷(期):2024.46(5)