The rapid development of artificial intelligence technology has brought about the wide application of synthetic speech technology,but also caused security problems such as identity forgery and fraud.In this paper,we propose an improved syn-thetic speech detection method using deep learning technology and BiLSTM.By extracting Meir frequency cepstrum coefficient(MFCCs)features and inputting them into the CNN-BiLSTM hybrid model,the method utilizes the feature extraction of CNN and the sequence processing capability of BiLSTM to learn the differences between natural and synthetic speech,effectively improving the detection accuracy and robustness.Experiments on ASVspoof 2019 and 2021 datasets show that the method has an equal error rate of about 5%,which is superior to some existing techniques in terms of detection accuracy and robustness.
关键词
合成语音检测/双向长短时记忆网络/深度学习
Key words
synthetic voice detection/bidirectional long short-term memory network/deep learning