首页|融合语言模型的端到端濒危语言语音识别研究

融合语言模型的端到端濒危语言语音识别研究

扫码查看
保护濒危语言的有效方法主要是保存该语言的语音和视频数据,并需要母语人士和专业领域的语言学家对语料进行标注。土家语是无文字濒危语言,由于语料资源匮乏及其独特的语法结构,不仅导致语音识别准确性低,而且仅停留在语音层面。故提出融合汉语对译词级语言模型的端到端语音识别模型,将语言模型融合到声学模型的解码阶段进行联合解码,输出中文序列标记的土家语。该模型首先搭建基于Attention-CTC的混合语音识别模型;其次通过基于词法信息的建模单元为词级国际音标序列的TransLM(基于transformer的词级语言模型)模型,输出对译序列。在土家语语音数据上的实验表明该模型针对土家语的识别相比较基于Attention的和基于CTC的模型在WER指标上分别降低了 10。3%和 9。6%,为未来研究如何提升将语音信号转为国际音标序列的正确率做了有效尝试。
End-to-End Endangered Language Speech Recognition Using Language Modeling
An effective way to protect an endangered language is mainly to preserve the voice and video data of the language,and requires native speakers and linguists in the professional field to annotate the corpus.Tujia language is an endangered language without writing.Due to the lack of corpus resources and its unique grammatical structure,not only the accuracy of speech recognition is low,but it only stays at the phonetic level.This paper proposes an end-to-end speech recognition model that integrates the Chinese word-level language model,integrates the language model into the decoding stage of the acoustic model for joint decoding,and outputs Tujia language with Chinese se-quence marks.The model first builds a hybrid speech recognition model based on Attention-CTC;secondly,the TransLM model based on the lexical information-based modeling unit is the word-level IPA sequence,and outputs the translation sequence.Experiments on Tujia speech data show that compared with the Attention-based and CTC-based models,the WER indicators of the model for Tujia language recognition are reduced by 10.3%and 9.6%,re-spectively.The correct rate of the phonetic sequence has been effectively tried.

Endangered languageNo script languageEnd-to-end speech recognitionLanguage modelAtten-tion mechanism

阮征、于重重、钱兆鹏、吴佳佳

展开 >

北京工商大学人工智能学院,北京 100048

濒危语言 无文字语言 端到端语音识别 语言模型 注意力机制

2023年研究生科研能力提升计划项目

21YJAZH107

2024

计算机仿真
中国航天科工集团公司第十七研究所

计算机仿真

CSTPCD
影响因子:0.518
ISSN:1006-9348
年,卷(期):2024.41(7)
  • 2