Error rate evaluation method of automatic speech recognition system based on DNN
Objective This study aims to assess the word error rate(WER)of automatic speech recognition(ASR)systems to support the rehabilitation needs of individuals with speech impairments.A novel approach is introduced,using phoneme posterior probabilities from a deep neural network(DNN)to predict WER,instead of calculating WER between reference transcripts and hidden Markov model(HMM)-decoded transcripts.Methods Speech signals are processed through feature extraction and input to a DNN model,generating phonetic posterior grams(PPG).Three performance metrics derived from PPG reflect WER.The predicted WER data from real acoustic scenarios are analyzed for contrast and validation.Additionally,20 diverse acoustic models are built and evaluated,investigating the impact of model size on prediction accuracy.Results Among the evaluated models,a network with 2 hidden layers,each containing 512 neurons,achieves the most accurate WER prediction,bypassing ASR decoding and providing dependable results.Conclusions Phoneme probability-based metrics effectively predict WER and remove dependency on reference transcripts and word labels.