首页|低资源非自回归壮语语音合成

低资源非自回归壮语语音合成

扫码查看
基于FastSpeech2模型,文章提出了非自回归的壮语语音合成模型Zhuang-TTS.为了提升模型合成壮语语音的韵律,根据壮语特点及实地调查提出了一套新的壮语音系(声调、声母或辅音、韵母或元音),同时针对壮语声学特点进行了改进:①使用壮语音素序列表征壮语发音信息;②使用音素级的声学调节器(与FastPitch类似),使合成结果更加稳定;③使用Conformer代替FastSpeech2模型中的Transformer,同时构建了一个壮语语音合成语料库.实验结果表明,Zhuang-TTS在韵律方面的意见评分(Mean Opinion Score,MOS)达到3.90,合成实时率达8.65×10-2.该模型在合成壮语语音的质量和速度方面获得了较大提升,优于Taco-tron2 和 FastSpeech2基线模型,研究推动了壮语语音合成领域的发展.
Low-resource Non-autoregressive Zhuang Speech Synthesis
This paper introduces a non-autoregressive Zhuang text-to-speech synthesis model,Zhuang-TTS,based on the FastSpeech2 model.To enhance the rhythmic quality of synthesized Zhuang speech,a new set of Zhuang phonetic features is proposed based on the characteristics of Zhuang language and on-field investigations.These features include tone,initial consonants or con-sonants,and final vowels or vowels.Improvements are made to address Zhuang language's acoustic characteristics:(ⅰ)Utilizing Zhuang phoneme sequences to represent pronunciation information;(ⅱ)Employing a phoneme-level acoustic regulator(similar to FastPitch)for enhanced stability in synthesis results;(ⅲ)Substituting the Conformer for the Transformer in the FastSpeech2 model,considering the acoustic characteristics of Zhuang language.Additionally,a Zhuang speech synthesis corpus is constructed.Experimental results show that Zhuang-TTS achieves a Mean Opinion Score(MOS)of 3.90 in terms of rhythm,a synthesis real-time rate of 8.65×10-2.The model's substan-tial improvements in the quality and speed of synthesizing Zhuang speech,outperforming the base-line models Tacotron2 and FastSpeech2,have also contributed to the advancement of the field of Zhuang speech synthesis.

Zhuang language speech synthesisnon-autoregressive acoustic modelnon-autoregres-sive vocoderConformer

王杰、秦董洪

展开 >

广西民族大学人工智能学院,广西南宁 530006

壮语语音合成 非自回归声学模型 非自回归声码器 Conformer

广西科技基地和人才专项广西民族大学横向科研项目

桂科AD230260542022450016000429

2024

中央民族大学学报(自然科学版)
中央民族大学

中央民族大学学报(自然科学版)

影响因子:0.462
ISSN:1005-8036
年,卷(期):2024.33(2)
  • 33