首页|基于改进ConvMixer和动态焦点损失的视听情感识别

基于改进ConvMixer和动态焦点损失的视听情感识别

扫码查看
视听双模态情感识别是情感计算领域的研究热点.目前情感识别方法存在无法同时提取视频局部和全局特征,多模态数据融合简单,损失函数在模型优化中无法关注错分样本等问题,导致情感识别结果精确度不高.本文提出一种基于改进的ConvMixer和动态权重焦点损失函数的视听情感识别方法.采用空间和时间邻接矩阵代替ConvMixer中的深度分离卷积,提取视频时域空域上的全局和局部特征.提出跨模态时间注意力模块,以对称结构捕捉模态间的时间相关性,提高特征融合效果.结合混淆矩阵计算具有动态权重的焦点损失函数,差异化地加大错分样本在损失中的占比,优化模型参数.在公开数据集上的实验结果表明,本文方法能提取到代表性特征,可有效优化网络结构,提高了情感识别的准确率.
Improved ConvMixer and Focal Loss with Dynamic Weight for Audio-Visual Emotion Recognition
Audio-visual bimodal emotion recognition is a research hotspot in the field of emotion computing.At pres-ent,emotion recognition methods cannot simultaneously extract local and global features of video,multi-modal data fusion is simple,loss function can not pay attention to misclassification of samples in model optimization,resulting in low accura-cy of emotion recognition results.In this paper,an audio-visual emotion recognition method based on improved ConvMixer and focus loss function with dynamic weight is proposed.Spatial and temporal adjacent matrices were used instead of deep separation convolution in ConvMixer to extract global and local features in video spatial and temporal domain.A cross-modal temporal attention module is proposed to capture the temporal correlation between modals with a symmetrical struc-ture to improve the feature fusion effect.The focus loss function with dynamic weight was calculated by the confusion ma-trix,and the proportion of error samples in the loss was increased differentially to optimize the model parameters.Experi-mental results on public data sets show that the proposed method can extract representative features,optimize the network structure effectively,and improve the accuracy of emotion recognition.

emotion recognitionConvMixerattention mechanismmulti-modal feature fusionfocal loss function

师硕、覃嘉俊、于洋、郝小可

展开 >

河北工业大学人工智能与数据科学学院,天津 300401

情感识别 ConvMixer 注意力机制 多模态特征融合 焦点损失函数

国家自然科学基金国家自然科学基金河北省自然科学基金河北省自然科学基金

6180607162102129F2020202025F2021202030

2024

电子学报
中国电子学会

电子学报

CSTPCD北大核心
影响因子:1.237
ISSN:0372-2112
年,卷(期):2024.52(8)
  • 4