首页|基于DCNN-CTC的中文儿童语音识别研究

基于DCNN-CTC的中文儿童语音识别研究

扫码查看
针对卷积神经网络(CNN)语音信号建模能力不足的问题,提出了一种基于深度卷积神经网络和连接时序分类器(DCNN-CTC)的中文童声识别模型.此模型以CTC作为目标损失函数,通过在卷积神经网络的层之间引入残差跳跃连接,将前一层的输出直接传递到后一层,构建一套残差卷积层,增加了声学模型中卷积层的数量.然后,在残差结构的内部和外部分别应用了Mish和Maxout激活函数,减少网络的崩溃现象和过拟合问题,进而增强语音识别的效率.结果表明,与传统的语音识别模型CNN、DCNN和CTC相比,DCNN-CTC模型在中文儿童语音识别中的音素错误率(PER)和词错误率(WER)最低.
Study on Speech Recognition of Chinese Children Based on DCNN-CTC
Aiming at the insufficient capability of convolutional neural networks(CNN)for speech signal modelling,a Chinese child voice recognition model based on deep convolutional neural networks and connected timing classifier(DCNN-CTC)is proposed.This model takes CTC as its target loss function,and increases the number of convolutional layers in the acoustic model by introducing residual jump connections between the layers of the convolutional neural network to achieve that the outputs of the former layer are directly passed to the latter layer,and a set of residual convolutional layers are constructed.Then,Mish and Maxout activation functions are applied inside and outside the residual structure,respectively,to reduce the network's collapse phenomenon and overfitting problem,and thus enhance the efficiency of speech recognition.The results show that the DCNN-CTC model has the lowest phoneme error rate(PER)and word error rate(WER)in Chinese children's speech recognition compared to the traditional speech recognition models CNN,DCNN and CTC.

CNNCTCresidual jumpacoustic model

董胡、夏明霞、李垣陵

展开 >

长沙师范学院信息科学与工程学院,湖南 长沙 410100

卷积神经网络 连接时序分类器 残差跳跃 声学模型

2024

自动化应用
重庆西南信息有限公司

自动化应用

影响因子:0.156
ISSN:1674-778X
年,卷(期):2024.65(23)