计算机技术与发展2024,Vol.34Issue(4) :146-152.DOI:10.20165/j.cnki.ISSN1673-629X.2024.0022

低资源青岛方言语音识别方法研究

Research on Low-resource Qingdao Dialect Speech Recognition Method

相紫涵 谷潇 饶崇郅 渐令
计算机技术与发展2024,Vol.34Issue(4) :146-152.DOI:10.20165/j.cnki.ISSN1673-629X.2024.0022

低资源青岛方言语音识别方法研究

Research on Low-resource Qingdao Dialect Speech Recognition Method

相紫涵 1谷潇 1饶崇郅 1渐令1
扫码查看

作者信息

  • 1. 中国石油大学(华东)经济管理学院,山东 青岛 266580
  • 折叠

摘要

方言识别是语音识别的重要研究方向,常见的语音识别系统是基于标准语言训练的,导致其方言识别效果不佳.鉴于此,该文选择青岛方言作为应用案例开展方言语音识别研究.为解决方言语料匮乏、训练深度网络模型困难导致识别准确率受限等问题,提出应用数据增强方法,搭建基于改进Conformer的方言语音识别模型.首先,收集多源语音数据构建方言小型语料库;其次,采用数据增强技术扩充训练数据,以解决语料匮乏问题;最后,为了更好地提取信息,改进Conformer模型的降采样结构,引入膨胀卷积和Mish激活函数,实现语音到文本的直接映射.实验结果表明,提出的改进降采样模块的端到端模型结合数据增强方法后字错率可达25.96%,能有效实现低资源条件下的方言识别.

Abstract

Dialect recognition is an important research direction in automatic speech recognition.Common speech recognition systems are based on standard language training,which results in poor performance in dialect recognition.In view of this,we choose Qingdao dialect as an application case for dialect speech recognition research.In order to solve the problems of lack of dialect corpus and difficulty in training deep network model,which lead to limited recognition accuracy,we propose to apply data augmentation method and build a dialect speech recognition model based on improved Conformer.Firstly,multi-source speech data is collected to construct a small-scale dialect corpus.Secondly,data augmentation techniques are applied to expand the training data to address the problem of data scarcity.Fi-nally,in order to better extract information,the down-sampling structure of the Conformer model is improved,and dilated convolution and Mish activation function are introduced to realize the direct mapping from speech to text.Experimental results show that the character error rate of the end-to-end model with improved down-sampling module combined with data augmentation method can reach 25.96%,which can effectively realize dialect recognition under low resource conditions.

关键词

语音识别/端到端/低资源/数据增强/青岛方言

Key words

speech recognition/end-to-end/low resource/data augmentation/Qingdao dialect

引用本文复制引用

基金项目

国家重点研发计划(2021YFA1000100)

国家重点研发计划(2021YFA1000102)

出版年

2024
计算机技术与发展
陕西省计算机学会

计算机技术与发展

CSTPCD
影响因子:0.621
ISSN:1673-629X
参考文献量19
段落导航相关论文