首页|基于MHA-ResNet的语音情绪识别算法

基于MHA-ResNet的语音情绪识别算法

扫码查看
语音情绪识别的一个重要挑战是从语音信号中提取关键特征来提高识别准确率.在现有研究的基础上,提出了一种基于自注意力残差网络(Multi-Head-Attention Residual Network,MHA-ResNet)的语音情绪识别模型,提高了语音情绪识别准确率.首先,将原始语音信号数据进行预处理;其次,将提取到的情绪特征集,利用多头注意力机制具备的并行化处理且自适应关注的特性,初步获取不同状态下鉴别性的语音情绪信息;最后,残差网络进一步获取深层情绪特征,完成不同情绪的识别.为验证模型有效性,在CASIA和EmoDB数据集上进行实验,其结果显示识别准确率分别为 93.59%和 97.57%.
Speech emotion recognition algorithm based on MHA-ResNet
A significant challenge in the field of speech emotion recognition lies in the extraction of key features from speech signals to enhance recognition accuracy.Drawing on existing research,a model for speech emotion recognition based on Multi-Head-Attention Residual Network(MHA-ResNet)is proposed to elevate the precision of recognizing emotions conveyed through speech.Firstly,the emotional feature set is extracted from the preprocessed speech data.And then,the discriminative speech emotional information in different states is obtained by using the parallel processing characteristics of the multi-head attention mechanism.Finally,deep emotional features are further captured by the residual network,facilitating accurate recognition of diverse emotions.To validate the efficacy of this model,experi-ments are conducted using CASIA and EmoDB data sets,yielding recognition accuracies of 93.59%and 97.57%,respectively.

speech emotion recognitionmultiple attention mechanismresidual networkemotional feature set

周传华、郝敏、曾辉、王勇

展开 >

安徽工业大学 管理科学与工程学院,安徽 马鞍山 243002

中国科学技术大学 计算机科学与技术学院,安徽 合肥 230026

语音情绪识别 多头注意力机制 残差网络 情绪特征集

国家自然科学基金国家自然科学基金

7137101371772002

2024

微电子学与计算机
中国航天科技集团公司第九研究院第七七一研究所

微电子学与计算机

CSTPCD
影响因子:0.431
ISSN:1000-7180
年,卷(期):2024.41(9)