In the interdisciplinary field of modern computer vision and natural language processing,digital talking facial generation technology has become an increasingly important research topic.Digital facial generation technology focuses on generating realistic facial images based on predetermined text or audio sequences.In recent years,deep learning methods such as convolutional neural networks,generative adversarial networks,and neural rendering fields have been used for digital talking face generation,which shows significant research and application value.These methods have not only attract widespread attention from the academic community,but also have been applied in industry to solve specific problems in image processing and computer vision.Although some progress has been made,the practical application of these technologies still faces many challenges.Comprehensively review and evaluate the specific implementation of deep learning methods in the generation of digital talking face to identify the advantages and disadvantages of existing methods,explore common problems that need to be solved,and highlight open issues that still require further research.In addition,currently available datasets from a statistical perspective were listed,evaluated and compared so that researchers can more easily choose datasets that meet their needs.
关键词
数字说话人脸生成/虚拟人/语音驱动
Key words
digital talking face generation/virtual human/audio-driven