Research on the Construction of Mongolian Speech Recognition Corpus Based on Speech-Text Alignment Technology
At present,there is a huge gap between the speech recognition data applicable to Mongo-lian and the training data of English and Chinese in terms of scale.Therefore,a low-cost dataset construction method is needed to make up for the shortcomings in data.Considering the huge amount of Mongolian language data resources generated in life interactions,many of them are in the form of rough controls of speech texts.The experiments adopt the technical route of extracting an annotated corpus from the raw corpus that can be used for training,and the TV dubbing script and the corre-sponding finished film are selected as samples of such a raw corpus.The raw corpus refinement is considered as a phonetic text alignment problem.Through a series of automated processes,the script and the corresponding audio are converted into a data form suitable for speech-text alignment pro-cessing,and an iterative alignment method is used to obtain the speech-text alignment results,thus generating"speech-text pairs"for Mongolian speech recognition.A sample check of the generated data reveale that the generated data has good quality and is basically consistent with manual annota-tion,saving the cost of data production.