首页|基于多任务学习的语音情感识别

基于多任务学习的语音情感识别

扫码查看
在近期的语音情感识别研究中,研究人员尝试利用深度学习模型从语音信号中识别情感.然而,传统基于单任务学习的模型对语音的声学情感信息关注度不足,导致情感识别的准确率较低.鉴于此,本文提出了一种基于多任务学习、端到端的语音情感识别网络,以挖掘语音中的声学情感,提升情感识别的准确率.为避免采用频域特征造成的信息损失,本文利用基于时域信号的Wav2vec2.0自监督网络作为模型的主干网络,提取语音的声学特征和语义特征,并利用注意力机制将两类特征进行融合作为自监督特征.为了充分利用语音中的声学情感信息,使用与情感有关的音素识别作为辅助任务,通过多任务学习挖掘自监督特征中的声学情感.在公开数据集IEMOCAP上的实验结果表明,本文提出的多任务学习模型实现了76.0%的加权准确率和76.9%的非加权准确率,相比传统单任务学习模型性能得到了明显提升.同时,消融实验验证了辅助任务和自监督网络微调策略的有效性.
Speech Emotion Recognition with Multi-task Learning
In recent speech emotion recognition,researchers attempt to identify emotion from speech signals using deep learning models.However,traditional single-task learning-based models do not pay enough attention to speech acoustic emotional information,resulting in low accuracy of emotion recognition.In view of this,this paper proposes a multi-task learning,end-to-end speech emotion recognition network to mine acoustic emotion in speech and improve the accuracy of emotion recognition.In order to avoid the loss of information caused by using frequency domain features,this paper adopts the Wav2vec2.0 as the backbone network of the model to extract the acoustic and semantic features of speech,and the attention mechanism is used to integrate the two kinds of features as self-supervised features.To make full use of the acoustic sentiment information in speech,using emotion-related phoneme recognition as an auxiliary task,a multi-task learning model is used to mine acoustic sentiment in self-supervised features.Experimental results on the public dataset IEMOCAP show that,the proposed multi-task learning model achieves a weighted accuracy rate of 76.0%and an unweighted accuracy rate of 76.9%,with significantly improved model performance compared to the traditional single-task learning model.Meanwhile,ablation experiments verify the effectiveness of auxiliary task and self-supervised network fine-tuning strategy.

deep learningmulti-task learningspeech emotion recognitionself-supervised modelfine-tuning strategy

李云峰、闫祖龙、高天、方昕、邹亮

展开 >

中国矿业大学信息与控制工程学院,徐州 221116

科大讯飞股份有限公司核心研发平台,合肥 230088

深度学习 多任务学习 语音情感识别 自监督模型 微调策略

科技创新2030新一代人工智能重大项目徐州市基础研究计划

2020AAA0107300KC22020

2024

数据采集与处理
中国电子学会 中国仪器仪表学会信号处理学会 中国仪器仪表学会中国物理学会微弱信号检测学会 南京航空航天大学

数据采集与处理

CSTPCD北大核心
影响因子:0.679
ISSN:1004-9037
年,卷(期):2024.39(2)
  • 25