文章针对语音情感识别领域的复杂性,研究基于深度学习的情感识别框架.首先,用梅尔频谱系数进行特征提取,并引入音频数据增强方法.其次,采用长短时记忆网络(Long Short Term Memory,LSTM)方法进行情感识别.最后,利用瑞尔森情感语音和歌曲视听数据库(Ryerson Audio Visual Database of Emotional Speech and Song,RAVDESS)对该方法进行测试.实验结果表明,该方法能够准确地对语音样本进行分类.
Research on Emotion Recognition in Speech Dialogue Based on Deep Learning
The article focuses on the complexity of speech emotion recognition and studies a deep learning based emotion recognition framework.Firstly,feature extraction is performed using Mel spectral coefficients,and audio data augmentation methods are introduced.Secondly,the Long Short Term Memory(LSTM)method is used for emotion recognition.Finally,the method was tested using the Ryerson Audio Visual Database of Emotional Speech and Song(RAVDESS).The experimental results show that this method can accurately classify speech samples.