计算机应用与软件2024,Vol.41Issue(9) :224-229.DOI:10.3969/j.issn.1000-386x.2024.09.032

基于改进K均值聚类的语音情感识别深度学习方法

DEEP LEARNING METHOD FOR SPEECH EMOTION RECOGNITION BASED ON IMPROVED K-MEAN CLUSTERING

李巧君 郭彍
计算机应用与软件2024,Vol.41Issue(9) :224-229.DOI:10.3969/j.issn.1000-386x.2024.09.032

基于改进K均值聚类的语音情感识别深度学习方法

DEEP LEARNING METHOD FOR SPEECH EMOTION RECOGNITION BASED ON IMPROVED K-MEAN CLUSTERING

李巧君 1郭彍2
扫码查看

作者信息

  • 1. 河南工业职业技术学院电子信息工程学院 河南南阳 473000
  • 2. 电子科技大学电子科学与工程学院 四川成都 610054
  • 折叠

摘要

针对当前语音情感识别(Speech Emotion Recognition,SER)方法中准确性低和时间复杂度高的问题,提出一种基于改进K均值聚类的语音情感识别深度学习方法.采用改进的K-均值聚类算法从整个音频信号中选取反映情感特征的关键片段;使用短时傅里叶变换将所选序列转化为一个谱图;利用深度残差模型ResNet和深度双向长短时记忆Bi-LSTM网络从空间和时间上学习表征谱图中与情感相关的隐藏特征,基于Softmax分类器获得最终的情感分类.实验结果表明,所提方法比其他识别方法具有明显的优势,在改善情感识别率的同时,降低了模型的处理时间.

Abstract

Aimed at the problems of low accuracy and high time complexity in current speech emotion recognition(SRE)methods,a deep learning method for speech emotion recognition based on the improved k-mean clustering is proposed.The improved k-mean clustering algorithm was used to select the key segments which reflected the emotional features from the whole audio signal.The selected sequence was transformed into a spectrum by using short-time Fourier transform.The deep residual model ResNet and deep Bi-LSTM network were used to learn the hidden features related to emotion in the representation spectrum from space and time.The final sentiment classification was obtained based on Softmax classifier.Experimental results show that the proposed method has obvious advantages over other recognition methods,which improves the emotion recognition rate and reduces the processing time of the model.

关键词

语音情感识别/深度双向长短时记忆/K-均值聚类/短时傅里叶变换

Key words

Speech emotion recognition/Deep Bi-LSTM/K-mean clustering/Short-time Fourier transform

引用本文复制引用

基金项目

河南省高等学校重点科研项目(19A520022)

河南省高等职业学校青年骨干教师培养计划项目(教职成函[2019]326号)

出版年

2024
计算机应用与软件
上海市计算技术研究所 上海计算机软件技术开发中心

计算机应用与软件

CSTPCD北大核心
影响因子:0.615
ISSN:1000-386X
段落导航相关论文