基于MHA-ResNet的语音情绪识别算法

Speech emotion recognition algorithm based on MHA-ResNet

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：语音情绪识别的一个重要挑战是从语音信号中提取关键特征来提高识别准确率.在现有研究的基础上,提出了一种基于自注意力残差网络(Multi-Head-Attention Residual Network,MHA-ResNet)的语音情绪识别模型,提高了语音情绪识别准确率.首先,将原始语音信号数据进行预处理;其次,将提取到的情绪特征集,利用多头注意力机制具备的并行化处理且自适应关注的特性,初步获取不同状态下鉴别性的语音情绪信息;最后,残差网络进一步获取深层情绪特征,完成不同情绪的识别.为验证模型有效性,在CASIA和EmoDB数据集上进行实验,其结果显示识别准确率分别为 93.59%和 97.57%.

外文摘要：A significant challenge in the field of speech emotion recognition lies in the extraction of key features from speech signals to enhance recognition accuracy.Drawing on existing research,a model for speech emotion recognition based on Multi-Head-Attention Residual Network(MHA-ResNet)is proposed to elevate the precision of recognizing emotions conveyed through speech.Firstly,the emotional feature set is extracted from the preprocessed speech data.And then,the discriminative speech emotional information in different states is obtained by using the parallel processing characteristics of the multi-head attention mechanism.Finally,deep emotional features are further captured by the residual network,facilitating accurate recognition of diverse emotions.To validate the efficacy of this model,experi-ments are conducted using CASIA and EmoDB data sets,yielding recognition accuracies of 93.59%and 97.57%,respectively.

外文关键词：

speech emotion recognitionmultiple attention mechanismresidual networkemotional feature set

作者：

周传华、郝敏、曾辉、王勇

展开 >

作者单位：

安徽工业大学管理科学与工程学院,安徽马鞍山 243002

中国科学技术大学计算机科学与技术学院,安徽合肥 230026

关键词：

语音情绪识别多头注意力机制残差网络情绪特征集

基金：

国家自然科学基金国家自然科学基金

项目编号：

7137101371772002

出版年：

2024

DOI：

10.19304/J.ISSN1000-7180.2023.0710

微电子学与计算机

中国航天科技集团公司第九研究院第七七一研究所

微电子学与计算机

CSTPCD

影响因子：0.431

ISSN：1000-7180

年,卷(期)：2024.41(9)