首页|构音障碍说话人自适应研究进展及展望

构音障碍说话人自适应研究进展及展望

扫码查看
自动化语音识别工具让构音障碍者和正常人的沟通变得顺畅,因此,近年来构音障碍语音识别成为了一项热门研究.构音障碍语音识别的研究包括:收集构音障碍者和正常人的发音数据,对构音障碍者和正常人的语音进行声学特征表示,利用机器学习模型比较和识别发音的内容并定位出差异性,以帮助构音障碍者改善发音.然而,由于收集构音障碍者的大量语音数据非常困难,且构音障碍者存在发音的强变异性,导致通用语音识别模型的效果往往不佳.为了解决这一问题,许多研究提出将说话人自适应方法引入构音障碍语音识别.对大量相关文献进行调研发现,当前此类研究主要围绕特征域和模型域对构音障碍语音进行分析.文中重点分析特征变换和辅助特征如何解决语音特征的差异性表示,以及声学模型的线性变换、微调声学模型参数和基于数据选择的域自适应方法如何提高模型识别的准确率.最后总结出构音障碍说话人自适应研究当前遇到的问题,并指出未来的研究可以从语音变异性的分析、多特征多模态数据的融合以及基于小数量的自适应方法的角度,提升构音障碍语音识别模型的有效性.
Advancements and Prospects in Dysarthria Speaker Adaptation
Automatic speech recognition tools make communication between dysarthria and normal individuals smoother,there-fore,dysarthric speech recognition has become a hot research topic in recent years.The research on dysarthric speech recognition includes:collecting pronunciation data from dysarthria and normal individuals,representing acoustic features of dysarthria speech and normal speech,comparing and recognizing the content of pronunciation by machine learning model,and locating differences,so as to help dysarthria to improve their pronunciation.However,due to the significant difficulties in collecting a large amount of speech data from dysarthria,and the strong variability of their pronunciation,the performance of universal speech recognition models is often poor.To address this issue,many studies have proposed to introduce speaker adaptation methods into dysarthric speech recognition.Through extensive research on relevant literature,it has been found that current research mainly focuses on analyzing dysarthria speech in the feature domain and model domain.This paper focuses on analyzing how feature transformation and auxiliary features solve the differential representation of speech features,how linear transformation of acoustic models,fine-tuning of acoustic model parameters,and domain adaptation methods based on data selection improve the accuracy of model recog-nition.Finally,the current problems encountered in the research of dysarthria speaker adaptation are summarized,and it is pointed out that future research can improve the effectiveness of dysarthric speech recognition models from the perspectives of analyzing speech variability,fusing multi-feature and multi-modal data,and using a small number of speaker adaptation methods.

DysarthriaSpeaker adaptationAuxiliary featuresTransformationFine-tuningDomain adaptation

康新晨、董雪燕、姚登峰、钟经华

展开 >

北京联合大学北京市信息服务工程重点实验室 北京 100101

清华大学人文学院计算语言学实验室 北京 100084

清华大学心理学与认知科学研究中心 北京 100084

构音障碍 说话人自适应 辅助特征 变换 微调 域自适应

北京市自然科学基金国家语言文字工作委员会项目国家自然科学基金国家社会科学基金国家社会科学基金2019年度北京市教育委员会科技一般项目

4202028YB145-256203600121BYY10621&ZD292KM201911417005

2024

计算机科学
重庆西南信息有限公司(原科技部西南信息中心)

计算机科学

CSTPCD北大核心
影响因子:0.944
ISSN:1002-137X
年,卷(期):2024.51(8)