大数据2024,Vol.10Issue(3) :65-81.DOI:10.11959/j.issn.2096-0271.2024011

面向非平行语料的语音转换技术综述

A survey of voice conversion based on non-parallel data

李鹏程 张旭龙 王健宗 程宁 肖京
大数据2024,Vol.10Issue(3) :65-81.DOI:10.11959/j.issn.2096-0271.2024011

面向非平行语料的语音转换技术综述

A survey of voice conversion based on non-parallel data

李鹏程 1张旭龙 2王健宗 2程宁 2肖京2
扫码查看

作者信息

  • 1. 平安科技(深圳)有限公司, 广东 深圳 518063;中国科学技术大学,安徽 合肥 230026
  • 2. 平安科技(深圳)有限公司, 广东 深圳 518063
  • 折叠

摘要

语音转换是语音及人工智能领域的一项研究课题,其目标是在保持源语音内容不变的情况下改变语音的音色,使其听上去像是由另一个目标说话人说出的,同时还需保证语音的质量和自然度.面向非平行语料的语音转换技术是当下的热门研究内容,其使用非平行的多说话人语音数据集进行模型训练,能完成多对多以及任意对任意的语音转换.对近年来面向非平行语料的语音转换进行了全面的总结和分析.首先概述了早期面向平行语料的语音转换及其缺陷,然后对当下面向非平行语料的语音转换的各类实现方法进行介绍和对比分析,最后对语音转换技术进行了总结和展望.

Abstract

Voice conversion is a research topic in the fields of speech and artificial intelligence. The goal of voice conversion is to change the timbre of speech while preserving the content of the source speech, making it sounds like spoken by the target speaker. It is essential to ensure both the quality and naturalness of the converted speech. Voice conversion based on non-parallel data gains much attention currently, where models are trained using non-parallel multilingual speaker datasets, enabling many-to-many and any-to-any voice conversions. This paper provides a comprehensive summary and analysis of recent developments in non-parallel voice conversion. Firstly, we outline the early voice conversion techniques based on parallel corpus and their limitations. Then, we introduce and compare various approaches to voice conversion based on non-parallel data, providing a thorough analysis. Finally, a summary and outlook on voice conversion technology is provided.

关键词

语音转换/人工智能/深度学习

Key words

voice conversion/artificial intelligence/deep learning

引用本文复制引用

基金项目

广东省重点领域研发计划"新一代人工智能"重大专项(2021B0101400003)

出版年

2024
大数据
人民邮电出版社

大数据

CSTPCD
ISSN:2096-0271
段落导航相关论文