情感语音合成中的语义及韵律特征嵌入方法

扫码查看

原文链接

万方数据
维普

中文摘要：针对当前的情感语音合成方法存在合成音频容易忽略文本语义信息的问题,在文本编码器中引入BERT预训练模型,辅助编码器捕获文本语义特征,并提出了语义及韵律特征嵌入方法.缅甸语情感语料的缺乏导致模型难以合成高质量情感语音,因此,文中通过微调各个网络模块参数的方法探索缅甸语情感语音合成模型的训练方法.实验结果表明,文中提出的特征嵌入方法以及训练方法在情感语料缺乏情况下仍能合成出高质量的情感语音,平均情感意见得分分别为4.16与4.18.

外文标题：Semantic and prosodic feature embedding method in emotional speech synthesis

外文摘要：Based on the problem that the current emotional speech synthesis method easily ignores the text semantic information,BERT pre-training model is introduced into the text encoder to assist the encoder to capture the text semantic features,and an embedding method of semantic and prosodic features is proposed.The lack of Myanmar language emotional corpus makes it difficult for the model to synthesize high-quality e-motional speech,therefore,this paper explores the training method of Myanmar language emotional speech synthesis model by fine-tuning the parameters of each network module.The experiment results show that the feature embedding method and training method proposed in this paper can still synthesize high-quality emo-tional speech in the absence of emotional corpus,with an average emotional opinion score of 4.16 and 4.18,respectively.

外文关键词：

Myanmar languageemotional speech synthesissemantic featureprosodic featurefine-tun-ing

作者：

石凡、杨鉴

展开 >

作者单位：

云南大学信息学院,昆明 650000

关键词：

缅甸语情感语音合成语义特征韵律特征微调

基金：

国家自然科学基金资助项目

项目编号：

61961043

出版年：

2024

DOI：

10.13274/j.cnki.hdzj.2024.07.005

信息技术

黑龙江省信息技术学会中国电子信息产业发展研究院　中国信息产业部电子信息中心

信息技术

CSTPCD

影响因子：0.413

ISSN：1009-2552

年,卷(期)：2024.(7)