基于改进K均值聚类的语音情感识别深度学习方法

扫码查看

原文链接

万方数据
维普

中文摘要：针对当前语音情感识别(Speech Emotion Recognition,SER)方法中准确性低和时间复杂度高的问题,提出一种基于改进K均值聚类的语音情感识别深度学习方法.采用改进的K-均值聚类算法从整个音频信号中选取反映情感特征的关键片段;使用短时傅里叶变换将所选序列转化为一个谱图;利用深度残差模型ResNet和深度双向长短时记忆Bi-LSTM网络从空间和时间上学习表征谱图中与情感相关的隐藏特征,基于Softmax分类器获得最终的情感分类.实验结果表明,所提方法比其他识别方法具有明显的优势,在改善情感识别率的同时,降低了模型的处理时间.

外文标题：DEEP LEARNING METHOD FOR SPEECH EMOTION RECOGNITION BASED ON IMPROVED K-MEAN CLUSTERING

外文摘要：Aimed at the problems of low accuracy and high time complexity in current speech emotion recognition(SRE)methods,a deep learning method for speech emotion recognition based on the improved k-mean clustering is proposed.The improved k-mean clustering algorithm was used to select the key segments which reflected the emotional features from the whole audio signal.The selected sequence was transformed into a spectrum by using short-time Fourier transform.The deep residual model ResNet and deep Bi-LSTM network were used to learn the hidden features related to emotion in the representation spectrum from space and time.The final sentiment classification was obtained based on Softmax classifier.Experimental results show that the proposed method has obvious advantages over other recognition methods,which improves the emotion recognition rate and reduces the processing time of the model.

外文关键词：

Speech emotion recognitionDeep Bi-LSTMK-mean clusteringShort-time Fourier transform

作者：

李巧君、郭彍

展开 >

作者单位：

河南工业职业技术学院电子信息工程学院河南南阳 473000

电子科技大学电子科学与工程学院四川成都 610054

关键词：

语音情感识别深度双向长短时记忆 K-均值聚类短时傅里叶变换

基金：

河南省高等学校重点科研项目河南省高等职业学校青年骨干教师培养计划项目

项目编号：

19A520022教职成函[2019]326号

出版年：

2024

DOI：

10.3969/j.issn.1000-386x.2024.09.032

计算机应用与软件

上海市计算技术研究所上海计算机软件技术开发中心

计算机应用与软件

CSTPCD北大核心

影响因子：0.615

ISSN：1000-386X

年,卷(期)：2024.41(9)