Bi-directional long short-term memory network-based synthetic voice detection method
The rapid development of artificial intelligence technology has brought about the wide application of synthetic speech technology,but also caused security problems such as identity forgery and fraud.In this paper,we propose an improved syn-thetic speech detection method using deep learning technology and BiLSTM.By extracting Meir frequency cepstrum coefficient(MFCCs)features and inputting them into the CNN-BiLSTM hybrid model,the method utilizes the feature extraction of CNN and the sequence processing capability of BiLSTM to learn the differences between natural and synthetic speech,effectively improving the detection accuracy and robustness.Experiments on ASVspoof 2019 and 2021 datasets show that the method has an equal error rate of about 5%,which is superior to some existing techniques in terms of detection accuracy and robustness.
synthetic voice detectionbidirectional long short-term memory networkdeep learning