文本到视频生成:研究现状、进展和挑战

扫码查看

原文链接

万方数据
维普

中文摘要：文本到视频生成旨在根据用户给定的文本描述生成语义一致、内容真实、时序连贯且符合逻辑的视频.该文首先介绍了文本到视频生成领域的研究现状,详细介绍了3类主流的文本到视频生成方法:基于循环网络与生成对抗网络(GAN)的生成方法,基于Transformer的生成方法和基于扩散模型的生成方法.这3类生成方法在视频生成任务上各有优劣:基于循环网络与生成对抗网络的生成方法能生成较高分辨率和时长的视频,但难以生成复杂的开放域视频;基于Transformer的生成方法有能力生成复杂的开放域视频,但受限于Transformer模型单向偏置、累计误差等问题,难以生成高保真视频;扩散模型具有很好的泛化性,但受制于推理速度和高昂的内存消耗,难以生成高清的长视频.然后,该文介绍了文本到视频生成领域的评测基准和指标,并分析比较了现有主流方法的性能.最后,展望了未来可能的研究方向.

外文标题：Text-to-video Generation:Research Status,Progress and Challenges

外文摘要：The generation of video from text aims to produce semantically consistent,photo-realistic,temporal consistent,and logically coherent videos based on provided textual descriptions.Firstly,the current state of research in the field of text-to-video generation is elucidated in this paper,providing a detailed overview of three mainstream approaches:methods based on recurrent networks and Generative Adversarial Networks(GAN),methods based on Transformers,and methods based on diffusion models.Each of these models has its strengths and weaknesses in video generation.The recurrent networks and GAN-based methods can generate videos with higher resolution and duration but struggle with generating complex open-domain videos.Transformer-based methods show proficiency in generating open-domain videos but face challenges related to unidirectional biases and accumulated errors,making it difficult to produce high-fidelity videos.Diffusion models exhibit good generalization but are constrained by inference speed and high memory consumption,making it challenging to generate high-definition and lengthy videos.Subsequently,evaluation benchmarks and metrics in the text-to-video generation domain are explored,and the performance of existing methods is compared.Finally,potential future research directions in the field is outlined.

外文关键词：

Text-to-video generationDiffusion modelGenerative Adversarial Network(GAN)

作者：

邓梓焌、何相腾、彭宇新

展开 >

作者单位：

北京大学王选计算机研究所北京 100080

关键词：

文本到视频生成扩散模型生成对抗网络

基金：

国家自然科学基金国家自然科学基金国家自然科学基金

项目编号：

619252016213200162272013

出版年：

2024

DOI：

10.11999/JEIT240074

电子与信息学报

中国科学院电子学研究所国家自然科学基金委员会信息科学部

电子与信息学报

CSTPCD北大核心

影响因子：1.302

ISSN：1009-5896

年,卷(期)：2024.46(5)