面向非平行语料的语音转换技术综述

扫码查看

原文链接

万方数据
维普

中文摘要：语音转换是语音及人工智能领域的一项研究课题,其目标是在保持源语音内容不变的情况下改变语音的音色,使其听上去像是由另一个目标说话人说出的,同时还需保证语音的质量和自然度.面向非平行语料的语音转换技术是当下的热门研究内容,其使用非平行的多说话人语音数据集进行模型训练,能完成多对多以及任意对任意的语音转换.对近年来面向非平行语料的语音转换进行了全面的总结和分析.首先概述了早期面向平行语料的语音转换及其缺陷,然后对当下面向非平行语料的语音转换的各类实现方法进行介绍和对比分析,最后对语音转换技术进行了总结和展望.

外文标题：A survey of voice conversion based on non-parallel data

外文摘要：Voice conversion is a research topic in the fields of speech and artificial intelligence. The goal of voice conversion is to change the timbre of speech while preserving the content of the source speech, making it sounds like spoken by the target speaker. It is essential to ensure both the quality and naturalness of the converted speech. Voice conversion based on non-parallel data gains much attention currently, where models are trained using non-parallel multilingual speaker datasets, enabling many-to-many and any-to-any voice conversions. This paper provides a comprehensive summary and analysis of recent developments in non-parallel voice conversion. Firstly, we outline the early voice conversion techniques based on parallel corpus and their limitations. Then, we introduce and compare various approaches to voice conversion based on non-parallel data, providing a thorough analysis. Finally, a summary and outlook on voice conversion technology is provided.

外文关键词：

voice conversionartificial intelligencedeep learning

作者：

李鹏程、张旭龙、王健宗、程宁、肖京

展开 >

作者单位：

平安科技(深圳)有限公司, 广东深圳 518063

中国科学技术大学,安徽合肥 230026

关键词：

语音转换人工智能深度学习

基金：

广东省重点领域研发计划"新一代人工智能"重大专项

项目编号：

2021B0101400003

出版年：

2024

DOI：

10.11959/j.issn.2096-0271.2024011

大数据

人民邮电出版社

大数据

CSTPCD

ISSN：2096-0271

年,卷(期)：2024.10(3)