基于多任务学习的语音情感识别

扫码查看

原文链接

国家科技期刊平台
NETL
NSTL
万方数据

中文摘要：在近期的语音情感识别研究中,研究人员尝试利用深度学习模型从语音信号中识别情感.然而,传统基于单任务学习的模型对语音的声学情感信息关注度不足,导致情感识别的准确率较低.鉴于此,本文提出了一种基于多任务学习、端到端的语音情感识别网络,以挖掘语音中的声学情感,提升情感识别的准确率.为避免采用频域特征造成的信息损失,本文利用基于时域信号的Wav2vec2.0自监督网络作为模型的主干网络,提取语音的声学特征和语义特征,并利用注意力机制将两类特征进行融合作为自监督特征.为了充分利用语音中的声学情感信息,使用与情感有关的音素识别作为辅助任务,通过多任务学习挖掘自监督特征中的声学情感.在公开数据集IEMOCAP上的实验结果表明,本文提出的多任务学习模型实现了76.0%的加权准确率和76.9%的非加权准确率,相比传统单任务学习模型性能得到了明显提升.同时,消融实验验证了辅助任务和自监督网络微调策略的有效性.

外文标题：Speech Emotion Recognition with Multi-task Learning

外文摘要：In recent speech emotion recognition,researchers attempt to identify emotion from speech signals using deep learning models.However,traditional single-task learning-based models do not pay enough attention to speech acoustic emotional information,resulting in low accuracy of emotion recognition.In view of this,this paper proposes a multi-task learning,end-to-end speech emotion recognition network to mine acoustic emotion in speech and improve the accuracy of emotion recognition.In order to avoid the loss of information caused by using frequency domain features,this paper adopts the Wav2vec2.0 as the backbone network of the model to extract the acoustic and semantic features of speech,and the attention mechanism is used to integrate the two kinds of features as self-supervised features.To make full use of the acoustic sentiment information in speech,using emotion-related phoneme recognition as an auxiliary task,a multi-task learning model is used to mine acoustic sentiment in self-supervised features.Experimental results on the public dataset IEMOCAP show that,the proposed multi-task learning model achieves a weighted accuracy rate of 76.0%and an unweighted accuracy rate of 76.9%,with significantly improved model performance compared to the traditional single-task learning model.Meanwhile,ablation experiments verify the effectiveness of auxiliary task and self-supervised network fine-tuning strategy.

外文关键词：

deep learningmulti-task learningspeech emotion recognitionself-supervised modelfine-tuning strategy

作者：

李云峰、闫祖龙、高天、方昕、邹亮

展开 >

作者单位：

中国矿业大学信息与控制工程学院,徐州 221116

科大讯飞股份有限公司核心研发平台,合肥 230088

关键词：

深度学习多任务学习语音情感识别自监督模型微调策略

基金：

科技创新2030新一代人工智能重大项目徐州市基础研究计划

项目编号：

2020AAA0107300KC22020

出版年：

2024

DOI：

10.16337/j.1004-9037.2024.02.015

数据采集与处理

中国电子学会中国仪器仪表学会信号处理学会　中国仪器仪表学会中国物理学会微弱信号检测学会　南京航空航天大学

数据采集与处理

CSTPCD北大核心

影响因子：0.679

ISSN：1004-9037

年,卷(期)：2024.39(2)

参考文献量25