数字说话人脸生成技术综述

Survey of audio-driven talking face generation technology

张冰源 ¹张旭龙 ²王健宗 ²程宁 ²肖京²

扫码查看

作者信息

1. 平安科技(深圳)有限公司,广东深圳 518000;中国科学技术大学先进技术研究院,安徽合肥 230000
2. 平安科技(深圳)有限公司,广东深圳 518000
折叠

摘要

在现代计算机视觉和自然语言处理的交叉领域,数字说话人脸生成技术已经成为一个越来越重要的研究主题.数字说话人脸生成技术专注于依据预定的文本或音频序列生成逼真的人脸图像.近年来,深度学习方法,如卷积神经网络、生成对抗性网络以及神经渲染场在此领域已经表现出了显著的应用价值.这些方法不仅引起了学术界的广泛关注,而且在工业界得以实际应用,用于解决图像处理和计算机视觉方面的具体问题.尽管已经取得了一定的进展,实际应用这些方法仍然面临诸多挑战.综合分析和评估深度学习方法在数字说话人脸生成方面的具体实现,以识别现存方法的优缺点,探讨尚待解决的普遍问题,并突出仍需进一步研究的开放性问题.此外,从统计学角度列出了目前可用的数据集,并对其进行评估和比较,以便研究人员能更容易地选择满足他们需求的数据集.

Abstract

In the interdisciplinary field of modern computer vision and natural language processing,digital talking facial generation technology has become an increasingly important research topic.Digital facial generation technology focuses on generating realistic facial images based on predetermined text or audio sequences.In recent years,deep learning methods such as convolutional neural networks,generative adversarial networks,and neural rendering fields have been used for digital talking face generation,which shows significant research and application value.These methods have not only attract widespread attention from the academic community,but also have been applied in industry to solve specific problems in image processing and computer vision.Although some progress has been made,the practical application of these technologies still faces many challenges.Comprehensively review and evaluate the specific implementation of deep learning methods in the generation of digital talking face to identify the advantages and disadvantages of existing methods,explore common problems that need to be solved,and highlight open issues that still require further research.In addition,currently available datasets from a statistical perspective were listed,evaluated and compared so that researchers can more easily choose datasets that meet their needs.

关键词

数字说话人脸生成/虚拟人/语音驱动

Key words

digital talking face generation/virtual human/audio-driven

引用本文复制引用

基金项目

广东省重点领域研发计划"新一代人工智能"重大专项(2021B0101400003)

出版年

2024

大数据

人民邮电出版社

大数据

CSTPCD

ISSN：2096-0271

段落导航