Speech Emotion Recognition Based on Voice Rhythm Differences
Speech emotion recognition has an important application prospect in financial anti-fraud and other fields,but it is in-creasingly difficult to improve the accuracy of speech emotion recognition.The existing methods of speech emotion recognition based on spectrograms are difficult to capture the rhythm difference features,which affects the recognition effect.Based on the difference of speech rhythm features,this paper proposes a speech emotion recognition method based on energy frames and time-frequency fusion.The key is to screen high-energy regions of the spectrum in the speech,and reflect the individual voice rhythm differences with the distribution of high-energy speech frames and time-frequency changes.On this basis,an emotion recognition model based on convolutional neural network(CNN)and recurrent neural network(RNN)is established to realize the extraction and fusion of the time and frequency changes of the spectrum.On the open data set IEMOCAP,the experiment shows that com-pared with the method based on spectrogram,the weighted accuracy WA and the unweighted accuracy UA of the speech emotion recognition based on the difference of speech rhythm increases by 1.05%and 1.9%on average respectively.At the same time,it also shows that individual voice rhythm difference plays an important role in improving the effect of speech emotion recognition.