基于深度神经网络的藏语语音关键词检索方法

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：语音关键词识别作为人机语音交互的一项基础性研究课题,其目的是从连续的语音信号中提取特定的关键词,并实现对目标设备的唤醒以及其他相关功能.文章提出了一种基于DNN-HMM声学模型的藏语卫藏方言关键词检测方法.首先,通过切割、转换等方式对语音数据进行预处理;其次,使用MFCC从语音信号中提取出有效的特征作为模型的输入;再次,分别采用GMM-HMM和DNN-HMM模型对藏语声学特征进行建模.同时,为了提高模型的表现力和泛化能力,文章在模型中引入预训练和微调技术,对模型的结构进行了优化.实验结果表明,与传统基于GMM-HMM声学模型的识别结果相比,采用基于DNN-HMM声学模型的关键词检测方法能够更有效地检测出藏语语音关键词.

外文标题：A keyword retrieval method for Tibetan speech based on deep neural network

外文摘要：As a basic research topic of human-computer voice interaction,speech keyword recognition aims to ex-tract specific keywords from continuous speech signals and realize the wake-up of the target device and other related functions.This article proposes a keyword detection method for Tibetan-Amdo dialect based on the DNN-HMM acoustic model.Firstly,the speech data was preprocessed by cutting and converting.Secondly,MFCC was used to ex-tract effective features from the speech signal as the input of the model.Finally,the GMM-HMM and DNN-HMM models were used to model the acoustic characteristics of Tibetan language respectively.At the same time,in order to improve the expressiveness and generalization ability of the model,this article introduces pre-training and fine-tuning techniques into the model,and optimizes the structure of the model.The experimental results show that compared with the traditional recognition results based on the GMM-HMM acoustic model,the keyword detection method based on the DNN-HMM acoustic model can detect Tibetan speech keywords more effectively.

外文关键词：

Acoustic modelTibetanDeep learningKeyword detectionSpeech recognition

作者：

张恒、拉巴顿珠、官政先、肖鑫

展开 >

作者单位：

西藏大学信息科学技术学院西藏信息化省部共建协同创新中心,拉萨 850000

关键词：

声学模型藏语深度学习关键词检测语音识别

基金：

2022年西藏大学大学生创新性实验训练计划项目

项目编号：

2022XCX085

出版年：

2024

西藏科技

西藏科技信息研究所

西藏科技

影响因子：0.202

ISSN：1004-3403

年,卷(期)：2024.46(6)

参考文献量13