首页|情感语音合成中的语义及韵律特征嵌入方法

情感语音合成中的语义及韵律特征嵌入方法

扫码查看
针对当前的情感语音合成方法存在合成音频容易忽略文本语义信息的问题,在文本编码器中引入BERT预训练模型,辅助编码器捕获文本语义特征,并提出了语义及韵律特征嵌入方法。缅甸语情感语料的缺乏导致模型难以合成高质量情感语音,因此,文中通过微调各个网络模块参数的方法探索缅甸语情感语音合成模型的训练方法。实验结果表明,文中提出的特征嵌入方法以及训练方法在情感语料缺乏情况下仍能合成出高质量的情感语音,平均情感意见得分分别为4。16与4。18。
Semantic and prosodic feature embedding method in emotional speech synthesis
Based on the problem that the current emotional speech synthesis method easily ignores the text semantic information,BERT pre-training model is introduced into the text encoder to assist the encoder to capture the text semantic features,and an embedding method of semantic and prosodic features is proposed.The lack of Myanmar language emotional corpus makes it difficult for the model to synthesize high-quality e-motional speech,therefore,this paper explores the training method of Myanmar language emotional speech synthesis model by fine-tuning the parameters of each network module.The experiment results show that the feature embedding method and training method proposed in this paper can still synthesize high-quality emo-tional speech in the absence of emotional corpus,with an average emotional opinion score of 4.16 and 4.18,respectively.

Myanmar languageemotional speech synthesissemantic featureprosodic featurefine-tun-ing

石凡、杨鉴

展开 >

云南大学信息学院,昆明 650000

缅甸语 情感语音合成 语义特征 韵律特征 微调

国家自然科学基金资助项目

61961043

2024

信息技术
黑龙江省信息技术学会 中国电子信息产业发展研究院 中国信息产业部电子信息中心

信息技术

CSTPCD
影响因子:0.413
ISSN:1009-2552
年,卷(期):2024.(7)