At present,in the PSC test,manual scoring is still the main method of scoring the test questions.The development of oral evaluation technology has greatly reduced the manual scoring labor.As the core of spoken language evaluation technology,the end-to-end model combining CTC and Transformer proposed in this paper uses multi-encoder and word-level consistent methods to reduce the recognition error rate in complex recording environments.The word error rate on the test set is 5.6%.Through pre-training,the end-to-end model can achieve correct and incorrect classification of mis-pronunciation detection and diagnosis,and its performance is improved by 16% compared with the detection results of the better traditional machine learning model.Effectively improve the mis-pronunciation detection and diagnosis rate,and get a better result of 0.589.
关键词
语音识别/发音错误检测/语料库建设/深度学习/Transformer
Key words
speech recognition/mis-pronunciation detection and diagnosis/corpus construction/deep learning/Transformer