首页|跨语种迁移的藏语语音识别

跨语种迁移的藏语语音识别

扫码查看
近年来,随着深度学习技术的不断进步,端到端语音识别架构在语音识别领域展现出了卓越的性能,但是这种性能通常需要大量标注数据来支撑.在语料充足、资源丰富的语言中,该方法已经获得了令人满意的识别效果,然而对于一些语料较为匮乏的低资源语言,缺乏训练数据成了搭建语音识别系统的瓶颈.为了解决低资源语言因训练数据不足而导致模型性能下降的问题,本文采用迁移学习、数据增强等方法进行了优化,以藏语安多方言作为低资源语种的研究对象,在端到端架构中,使用藏语语音数据重新训练由其他资源丰富语料训练得到的基础声学模型,从而构建一个性能更好的藏语语音识别的声学模型.实验结果表明,与基线系统相比,使用自监督特征提取、预训练模型初始化参数和数据增强方法的错误率降低了 43.75%.
Tibetan Speech Recognition for Cross-language Transfer
In recent years,with the continuous progress of deep learning technology,end-to-end speech recognition architectures have shown excellent performance in the field of speech recognition.However,to achieve this performance,a large amount of annotated data is usually required to sup-port it.Satisfactory recognition results have been achieved in languages with abundant corpus and a-bundant resources.However,for some low resource languages with relatively scarce corpus,the lack of training data has become a bottleneck in building speech recognition systems.In order to solve the problem of model performance degradation caused by insufficient low resource language training da-ta,this paper uses Transfer learning,data enhancement and other methods to optimize.Tibetan Am-do Tibetan is taken as the research object of low resource languages.In the end-to-end architecture,Tibetan speech data is used to retrain the basic Acoustic model obtained from the training of other re-source rich corpus,so as to build a better Acoustic model for Tibetan speech recognition.Experi-mental results show that using self-supervised feature extraction,pre-trained model initialization pa-rameters and data augmentation method reduces the relative error rate by 30.4%compared with the baseline system.

speech recognitiontransfer learningopen-source data

李运鹏、张钦雅、者润玉、李冠宇

展开 >

西北民族大学中国民族语言与信息技术教育部重点实验室,甘肃兰州 730000

语音识别 迁移学习 数据增强

2024

中央民族大学学报(自然科学版)
中央民族大学

中央民族大学学报(自然科学版)

影响因子:0.462
ISSN:1005-8036
年,卷(期):2024.33(4)