A keyword retrieval method for Tibetan speech based on deep neural network
As a basic research topic of human-computer voice interaction,speech keyword recognition aims to ex-tract specific keywords from continuous speech signals and realize the wake-up of the target device and other related functions.This article proposes a keyword detection method for Tibetan-Amdo dialect based on the DNN-HMM acoustic model.Firstly,the speech data was preprocessed by cutting and converting.Secondly,MFCC was used to ex-tract effective features from the speech signal as the input of the model.Finally,the GMM-HMM and DNN-HMM models were used to model the acoustic characteristics of Tibetan language respectively.At the same time,in order to improve the expressiveness and generalization ability of the model,this article introduces pre-training and fine-tuning techniques into the model,and optimizes the structure of the model.The experimental results show that compared with the traditional recognition results based on the GMM-HMM acoustic model,the keyword detection method based on the DNN-HMM acoustic model can detect Tibetan speech keywords more effectively.