基于DCNN-CTC的中文儿童语音识别研究

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：针对卷积神经网络(CNN)语音信号建模能力不足的问题,提出了一种基于深度卷积神经网络和连接时序分类器(DCNN-CTC)的中文童声识别模型.此模型以CTC作为目标损失函数,通过在卷积神经网络的层之间引入残差跳跃连接,将前一层的输出直接传递到后一层,构建一套残差卷积层,增加了声学模型中卷积层的数量.然后,在残差结构的内部和外部分别应用了Mish和Maxout激活函数,减少网络的崩溃现象和过拟合问题,进而增强语音识别的效率.结果表明,与传统的语音识别模型CNN、DCNN和CTC相比,DCNN-CTC模型在中文儿童语音识别中的音素错误率(PER)和词错误率(WER)最低.

外文标题：Study on Speech Recognition of Chinese Children Based on DCNN-CTC

外文摘要：Aiming at the insufficient capability of convolutional neural networks(CNN)for speech signal modelling,a Chinese child voice recognition model based on deep convolutional neural networks and connected timing classifier(DCNN-CTC)is proposed.This model takes CTC as its target loss function,and increases the number of convolutional layers in the acoustic model by introducing residual jump connections between the layers of the convolutional neural network to achieve that the outputs of the former layer are directly passed to the latter layer,and a set of residual convolutional layers are constructed.Then,Mish and Maxout activation functions are applied inside and outside the residual structure,respectively,to reduce the network's collapse phenomenon and overfitting problem,and thus enhance the efficiency of speech recognition.The results show that the DCNN-CTC model has the lowest phoneme error rate(PER)and word error rate(WER)in Chinese children's speech recognition compared to the traditional speech recognition models CNN,DCNN and CTC.

外文关键词：

CNNCTCresidual jumpacoustic model

作者：

董胡、夏明霞、李垣陵

展开 >

作者单位：

长沙师范学院信息科学与工程学院,湖南长沙 410100

关键词：

卷积神经网络连接时序分类器残差跳跃声学模型

出版年：

2024

DOI：