新疆维吾尔语口音普通话短文的语音识别研究

CTC Research on speech recognition of Xinjiang Uyghur accented Mandarin essays

杨兴耀 ¹肖瑞 ¹卢进堂¹

扫码查看

作者信息

1. 新疆大学软件学院,新疆乌鲁木齐 830008
折叠

摘要

针对带有维吾尔语口音的普通话朗读短文的语音识别效果不理想的问题,本文建立了一个维吾尔语口音的普通话朗读短文语音数据集CH_ESSAY_SET.通过在该自建数据集和公开数据集Aishell-1和WenetSpeech以及科大讯飞、百度、腾讯、云知声等语音识别接口上的进行对比.实验表明,基于自建数据集训练的端到端声学模型对维吾尔语口音的普通话短文的语音识别精度相比,所提公开数据集和语音识别接口的识别准确率均有明显的提高,验证了自建数据集的有效性.并提出基于迁移学习的多语种任务训练进行特征迁移以及基于WeNet框架的预训练系统的优化方法.实验表明,所提优化方法相比于基线系统的语音识别精度提高了 8.9％,达到了 7.5％的字错误识别率.

Abstract

Aiming at the problem that the speech recognition effect of short texts read aloud in Mandarin with Uyghur accent is not ideal,this paper establishes a voice data set of short texts read aloud in Uyghur accent named CH_ESSAY_SET.Through comparative experiments on the self-built dataset and public datasets Aishell_1 and WenetSpeech,as well as speech recognition interfaces such as iFLYTEK,Baidu,Tencent,Yunzhisheng,etc.,it is shown that the end-to-end acoustic model trained based on the self-built dataset is effective for Uyghur Compared with the recognition accuracy of the proposed public dataset and the speech recognition interface,the speech recognition accuracy of the accented Mandarin short text is significantly improved,which verifies the effectiveness of the self-built dataset.The optimization methods of multilingual task training based on transfer learning for feature transfer and the pre-training system based on the framework named WeNet are proposed.The experiments show that the speech recognition accuracy of the proposed optimization method is improved by 8.9％compared with the baseline system,and the word error recognition rate is 7.5％.

关键词

语音识别/维吾尔语口音/端到端/语料库建设

Key words

speech recognition/Uyghur accent/end to end/corpus construction

引用本文复制引用

出版年

2024

东北师大学报(自然科学版)

东北师范大学

东北师大学报(自然科学版)

CSTPCD北大核心

影响因子：0.612

ISSN：1000-1832

段落导航