Tibetan Speech Recognition for Cross-language Transfer
In recent years,with the continuous progress of deep learning technology,end-to-end speech recognition architectures have shown excellent performance in the field of speech recognition.However,to achieve this performance,a large amount of annotated data is usually required to sup-port it.Satisfactory recognition results have been achieved in languages with abundant corpus and a-bundant resources.However,for some low resource languages with relatively scarce corpus,the lack of training data has become a bottleneck in building speech recognition systems.In order to solve the problem of model performance degradation caused by insufficient low resource language training da-ta,this paper uses Transfer learning,data enhancement and other methods to optimize.Tibetan Am-do Tibetan is taken as the research object of low resource languages.In the end-to-end architecture,Tibetan speech data is used to retrain the basic Acoustic model obtained from the training of other re-source rich corpus,so as to build a better Acoustic model for Tibetan speech recognition.Experi-mental results show that using self-supervised feature extraction,pre-trained model initialization pa-rameters and data augmentation method reduces the relative error rate by 30.4%compared with the baseline system.
speech recognitiontransfer learningopen-source data