Aiming at the problem that the speech recognition effect of short texts read aloud in Mandarin with Uyghur accent is not ideal,this paper establishes a voice data set of short texts read aloud in Uyghur accent named CH_ESSAY_SET.Through comparative experiments on the self-built dataset and public datasets Aishell_1 and WenetSpeech,as well as speech recognition interfaces such as iFLYTEK,Baidu,Tencent,Yunzhisheng,etc.,it is shown that the end-to-end acoustic model trained based on the self-built dataset is effective for Uyghur Compared with the recognition accuracy of the proposed public dataset and the speech recognition interface,the speech recognition accuracy of the accented Mandarin short text is significantly improved,which verifies the effectiveness of the self-built dataset.The optimization methods of multilingual task training based on transfer learning for feature transfer and the pre-training system based on the framework named WeNet are proposed.The experiments show that the speech recognition accuracy of the proposed optimization method is improved by 8.9%compared with the baseline system,and the word error recognition rate is 7.5%.
关键词
语音识别/维吾尔语口音/端到端/语料库建设
Key words
speech recognition/Uyghur accent/end to end/corpus construction