计算机工程与应用2024,Vol.60Issue(16) :168-176.DOI:10.3778/j.issn.1002-8331.2305-0049

C-BGA:结合对比学习的多模态语音情感识别网络

C-BGA:Multimodal Speech Emotion Recognition Network Combining Contrastive Learning

苗博瑞 许云峰 赵少杰 王嘉麟
计算机工程与应用2024,Vol.60Issue(16) :168-176.DOI:10.3778/j.issn.1002-8331.2305-0049

C-BGA:结合对比学习的多模态语音情感识别网络

C-BGA:Multimodal Speech Emotion Recognition Network Combining Contrastive Learning

苗博瑞 1许云峰 1赵少杰 1王嘉麟1
扫码查看

作者信息

  • 1. 河北科技大学信息科学与工程学院,石家庄 050000
  • 折叠

摘要

当前多模态语音情感识别(speech emotion recognition,SER)数据集规模较小,蕴含信息量较大,导致模型对各模态信息的拟合度不足,且无法挖掘出数据背后蕴含的信息.针对该问题,提出了基于对比学习的多模态语音情感分类网络.一方面在网络中引用跳连接(skip connections,SC)方法,有效解决了网络退化问题;另一方面借助对比学习(contrastive learning,CL)理论提出一种新的Loss计算方法,加快模型的拟合速度.模型在IEMOCAP数据集上进行实验,未加权精度(UA)为82.68%,加权精度(WA)为82.35%,实验结果表明了该模型的优越性.

Abstract

At present,the multimodal speech emotion recognition(SER)dataset is small in scale and contains a large amount of information,resulting in insufficient fitting of the model to each modal information,and the information behind the data cannot be excavated.Aiming at this problem,a multimodal speech emotion classification network based on contrastive learning is proposed.On the one hand,the method of skip connections(SC)is used in the network to effectively solve the problem of network degradation.On the other hand,a new Loss calculation method is proposed by means of contrastive learning(CL)theory to speed up the fitting speed of the model.The model is tested on the IEMOCAP dataset.The unweighted accuracy(UA)is 82.68%,and the weighted accuracy(WA)is 82.35%.According to the experimental results,the superiority of this model is demonstrated.

关键词

多模态/语音情感识别/对比学习/注意力机制

Key words

multimodal/speech emotion recognition/contrastive learning/attention mechanism

引用本文复制引用

基金项目

河北省重点研发计划(21373802D)

教育部人工智能协同育人项目(201801003011)

出版年

2024
计算机工程与应用
华北计算技术研究所

计算机工程与应用

CSTPCD北大核心
影响因子:0.683
ISSN:1002-8331
段落导航相关论文