基于DNN的自动语音识别系统错误率评估方法

扫码查看

原文链接

国家科技期刊平台
NETL
NSTL
万方数据

中文摘要：目的为客观评估自动语音识别(automatic speech recognition,ASR)系统的词错率(word error rate,WER),满足言语能力受损人群的康复需求,促进特殊人群客观言语能力评估体系构建,本研究提供了一种直接根据深度神经网络(deep neural network,DNN)发出的音素后验概率预测WER的方法,而不是计算参考转录文本与隐马尔可夫(hidden Markov model,HMM)解码的转录文本之间的WER.方法通过对语音信号进行特征提取并将其输入到 DNN 模型中以计算音素后验概率图(phonetic posterior grams,PPG).通过PPG计算出反映ASR系统WER的 3 种性能指标以达到预测目的.最后,对在4 种真实声学场景下所得WER预测数据进行分析,验证其有效性.同时,研究还搭建了20 种不同深度、宽度的声学模型进行性能评估对比,探究了模型规模对预测效果的影响.结果根据 20 种模型WER评估的数据,其中具有2 层隐藏层且每层含 512 个神经元的网络模型对WER数据预测误差达到最小,省略ASR系统解码步骤而得到可靠的WER预测数据.结论使用基于音素概率的性能指标可以实现对WER的有效预测,并且可以摆脱参考转录文本和单词标签的限制.

外文标题：Error rate evaluation method of automatic speech recognition system based on DNN

外文摘要：Objective This study aims to assess the word error rate(WER)of automatic speech recognition(ASR)systems to support the rehabilitation needs of individuals with speech impairments.A novel approach is introduced,using phoneme posterior probabilities from a deep neural network(DNN)to predict WER,instead of calculating WER between reference transcripts and hidden Markov model(HMM)-decoded transcripts.Methods Speech signals are processed through feature extraction and input to a DNN model,generating phonetic posterior grams(PPG).Three performance metrics derived from PPG reflect WER.The predicted WER data from real acoustic scenarios are analyzed for contrast and validation.Additionally,20 diverse acoustic models are built and evaluated,investigating the impact of model size on prediction accuracy.Results Among the evaluated models,a network with 2 hidden layers,each containing 512 neurons,achieves the most accurate WER prediction,bypassing ASR decoding and providing dependable results.Conclusions Phoneme probability-based metrics effectively predict WER and remove dependency on reference transcripts and word labels.

外文关键词：

deep neural networkhidden Markov modelautomatic speech recognition systemsystem performance evaluationphonetic posterior grams

作者：

王梓赫、张培茗、司博宇

展开 >

作者单位：

上海理工大学健康科学与工程学院(上海 200093)

上海健康医学院医疗器械学院(上海 201318)

关键词：

深度神经网络隐马尔科夫自动语音识别系统系统性能评估音素后验概率图

出版年：

2024

DOI：

10.3969/j.issn.1002-3208.2024.06.009

北京生物医学工程

北京市心肺血管疾病研究所

北京生物医学工程

CSTPCD

影响因子：0.474

ISSN：1002-3208

年,卷(期)：2024.43(6)