首页|面向非平行语料的语音转换技术综述

面向非平行语料的语音转换技术综述

扫码查看
语音转换是语音及人工智能领域的一项研究课题,其目标是在保持源语音内容不变的情况下改变语音的音色,使其听上去像是由另一个目标说话人说出的,同时还需保证语音的质量和自然度.面向非平行语料的语音转换技术是当下的热门研究内容,其使用非平行的多说话人语音数据集进行模型训练,能完成多对多以及任意对任意的语音转换.对近年来面向非平行语料的语音转换进行了全面的总结和分析.首先概述了早期面向平行语料的语音转换及其缺陷,然后对当下面向非平行语料的语音转换的各类实现方法进行介绍和对比分析,最后对语音转换技术进行了总结和展望.
A survey of voice conversion based on non-parallel data
Voice conversion is a research topic in the fields of speech and artificial intelligence. The goal of voice conversion is to change the timbre of speech while preserving the content of the source speech, making it sounds like spoken by the target speaker. It is essential to ensure both the quality and naturalness of the converted speech. Voice conversion based on non-parallel data gains much attention currently, where models are trained using non-parallel multilingual speaker datasets, enabling many-to-many and any-to-any voice conversions. This paper provides a comprehensive summary and analysis of recent developments in non-parallel voice conversion. Firstly, we outline the early voice conversion techniques based on parallel corpus and their limitations. Then, we introduce and compare various approaches to voice conversion based on non-parallel data, providing a thorough analysis. Finally, a summary and outlook on voice conversion technology is provided.

voice conversionartificial intelligencedeep learning

李鹏程、张旭龙、王健宗、程宁、肖京

展开 >

平安科技(深圳)有限公司, 广东 深圳 518063

中国科学技术大学,安徽 合肥 230026

语音转换 人工智能 深度学习

广东省重点领域研发计划"新一代人工智能"重大专项

2021B0101400003

2024

大数据
人民邮电出版社

大数据

CSTPCD
ISSN:2096-0271
年,卷(期):2024.10(3)