软件导刊2024,Vol.23Issue(9) :76-81.DOI:10.11907/rjdk.232097

基于ADTDNN的低资源语音识别方法研究

Research on Low-Resource Speech Recognition Based on ADTDNN

顾龙昊 黄连丽 周奎 张子越
软件导刊2024,Vol.23Issue(9) :76-81.DOI:10.11907/rjdk.232097

基于ADTDNN的低资源语音识别方法研究

Research on Low-Resource Speech Recognition Based on ADTDNN

顾龙昊 1黄连丽 2周奎 3张子越1
扫码查看

作者信息

  • 1. 湖北汽车工业学院 电气与信息工程学院;湖北汽车工业学院 汽车工程师学院 Sharing-X重点联合实验室,湖北 十堰 442002
  • 2. 湖北汽车工业学院 电气与信息工程学院
  • 3. 湖北汽车工业学院 汽车工程师学院 Sharing-X重点联合实验室,湖北 十堰 442002
  • 折叠

摘要

为解决低资源条件下由于训练数据不足导致识别精度降低、泛化能力较差的问题,提出一种语音识别方法.该方法利用卷积池化提取特征信息,将Attention机制与DTDNN融合成为ADTDNN,以提升低资源环境下模型捕捉序列中关键信息的能力;采用链接时序分类简化模型的识别流程;使用Transformer作为语言模型.在Aishell-1数据集上的实验结果表明,低资源环境下基于ADTDNN的语音识别模型与LAS、Transformer等主流端到端模型相比,字错误率分别降低了3.7%和1.0%.

Abstract

A speech recognition approach has been proposed to address the problem of reduced recognition accuracy and poorer generalization performance due to insufficient training data in low-resource conditions.This method leverages convolutional neural networks to extract feature information.It combines the attention mechanism with delayed time-delay neural networks,referred to as ADTDNN,enhancing the model's ability to capture key information in sequences within low-resource environments.The approach employs linking temporal classification to streamline the recognition process of the model.Additionally,a Transformer is utilized as the language model.Experimental results on the Aishell-1 dataset demonstrate that the ADTDNN-based speech recognition model in low-resource settings reduces word error rates by 3.7%and 1%compared to mainstream end-to-end models like LAS and Transformer,respectively.

关键词

语音识别/时延神经网络/Transformer/数据增强/低资源

Key words

speech recognition/time delay neural networks/Transformer/data enhancement/low resource

引用本文复制引用

出版年

2024
软件导刊
湖北省信息学会

软件导刊

影响因子:0.524
ISSN:1672-7800
段落导航相关论文