首页|基于混合密度网络的苗语语音合成方法

基于混合密度网络的苗语语音合成方法

扫码查看
苗语语音合成研究对民族文化的传承、保护和发展具有重要意义.针对苗语存在文字缺失、电子资源匮乏及数据难以获取导致其语音合成研究滞后的问题,提出一种基于混合密度网络的苗语语音合成方法.该方法根据持续时间来学习文本与语音间的对齐,解决了根据注意力机制学习对齐时容易出现的漏词、重复等问题.利用混合密度网络提取文本真实的持续时间,并与持续时间预测器联合训练,不需要额外的外部对齐器或自回归模型来指导模型进行对齐学习,简化了模型训练过程.以自建苗语语音合成语料库Hmong_data为基准数据,与先进方法进行对比实验.实验结果显示,该方法的平均意见得分为3.89,较Tacotron2方法提升了0.41,且产生的对齐图更清晰、平滑,合成的语音是可理解和正确的.
Mixure Density Network-Based Hmong Language Text-to-Speech Method
The research on Hmong language text-to-speech is of great significance for the inheritance,protection,and development of ethnic culture.In response to the problems of missing text,lack of electronic resources,and difficulty in obtaining data for Hmong language,a mix-ure density network-based Hmong language speech synthesis method is proposed.This method learns the alignment between text and speech based on duration,addressing issues such as missing words and repetitions that may occur during alignment learning with attention mecha-nism.The mix density network is used to extract the real duration of the text and jointly trained with the duration predictor,eliminating the need for additional external aligners or autoregressive models to guide alignment learning,simplifying the complexity of model training.Using the self-built Hmong language text-to-speech corpus,Hmong_data,as the benchmark data,comparative experiments are conducted with ad-vanced methods.The experimental results shows that the proposed method achieves an average opinion score of 3.89,which is a 0.41 improve-ment over the Tacotron2 method.The generated alignment graphs are clearer and smoother,and the synthesized speech is considered under-standable and correct.

Hmong languagetext-to-speechmixure density networkcorpus

蔡姗、郭胜、王林

展开 >

贵州民族大学 数据科学与信息工程学院

贵州省模式识别与智能系统重点实验室,贵州 贵阳 550025

苗语 语音合成 混合密度网络 语料库

贵州省科技计划项目贵州省科技计划项目贵州省教育厅自然科学研究项目贵州省教育厅自然科学研究项目

黔科合基础-ZK[2022]一般195黔科合基础-ZK[2023]一般143黔教技[2023]061号黔教技[2023]012号

2024

软件导刊
湖北省信息学会

软件导刊

影响因子:0.524
ISSN:1672-7800
年,卷(期):2024.23(4)
  • 20