Study on Speech Recognition of Chinese Children Based on DCNN-CTC
Aiming at the insufficient capability of convolutional neural networks(CNN)for speech signal modelling,a Chinese child voice recognition model based on deep convolutional neural networks and connected timing classifier(DCNN-CTC)is proposed.This model takes CTC as its target loss function,and increases the number of convolutional layers in the acoustic model by introducing residual jump connections between the layers of the convolutional neural network to achieve that the outputs of the former layer are directly passed to the latter layer,and a set of residual convolutional layers are constructed.Then,Mish and Maxout activation functions are applied inside and outside the residual structure,respectively,to reduce the network's collapse phenomenon and overfitting problem,and thus enhance the efficiency of speech recognition.The results show that the DCNN-CTC model has the lowest phoneme error rate(PER)and word error rate(WER)in Chinese children's speech recognition compared to the traditional speech recognition models CNN,DCNN and CTC.