首页|基于改进K均值聚类的语音情感识别深度学习方法

基于改进K均值聚类的语音情感识别深度学习方法

扫码查看
针对当前语音情感识别(Speech Emotion Recognition,SER)方法中准确性低和时间复杂度高的问题,提出一种基于改进K均值聚类的语音情感识别深度学习方法。采用改进的K-均值聚类算法从整个音频信号中选取反映情感特征的关键片段;使用短时傅里叶变换将所选序列转化为一个谱图;利用深度残差模型ResNet和深度双向长短时记忆Bi-LSTM网络从空间和时间上学习表征谱图中与情感相关的隐藏特征,基于Softmax分类器获得最终的情感分类。实验结果表明,所提方法比其他识别方法具有明显的优势,在改善情感识别率的同时,降低了模型的处理时间。
DEEP LEARNING METHOD FOR SPEECH EMOTION RECOGNITION BASED ON IMPROVED K-MEAN CLUSTERING
Aimed at the problems of low accuracy and high time complexity in current speech emotion recognition(SRE)methods,a deep learning method for speech emotion recognition based on the improved k-mean clustering is proposed.The improved k-mean clustering algorithm was used to select the key segments which reflected the emotional features from the whole audio signal.The selected sequence was transformed into a spectrum by using short-time Fourier transform.The deep residual model ResNet and deep Bi-LSTM network were used to learn the hidden features related to emotion in the representation spectrum from space and time.The final sentiment classification was obtained based on Softmax classifier.Experimental results show that the proposed method has obvious advantages over other recognition methods,which improves the emotion recognition rate and reduces the processing time of the model.

Speech emotion recognitionDeep Bi-LSTMK-mean clusteringShort-time Fourier transform

李巧君、郭彍

展开 >

河南工业职业技术学院电子信息工程学院 河南南阳 473000

电子科技大学电子科学与工程学院 四川成都 610054

语音情感识别 深度双向长短时记忆 K-均值聚类 短时傅里叶变换

河南省高等学校重点科研项目河南省高等职业学校青年骨干教师培养计划项目

19A520022教职成函[2019]326号

2024

计算机应用与软件
上海市计算技术研究所 上海计算机软件技术开发中心

计算机应用与软件

CSTPCD北大核心
影响因子:0.615
ISSN:1000-386X
年,卷(期):2024.41(9)