基于文本特征能量编码的多模态语声情感识别

扫码查看

原文链接

万方数据
维普

中文摘要：能量是情感表达重要的特征之一,说话时不同的文字有着各自的能量值,反映了说话者不同的情感状态.而把语声转录成文本的过程中,每个文字表达的能量信息并不包含在内,在提取文本特征的时候导致能量信息丢失.故对于文本模态,该文提出并设计了一种能量编码,将语声信号的每个词、每个停顿的能量值添加到转录文本中,使文本特征包含能量信息,并通过DC-BERT模型获取话语级文本特征.对于语声模态,利用OpenSMILE工具箱,提取语声中的浅层声学特征,采用随机森林算法,选取情感特征重要度靠前的1000维特征作为新的特征集.通过Transformer Encoder网络从新的特征集中提取深层特征,并将浅层特征和深层特征融合,形成多层次的语声情感特征.最后,利用基于自注意力机制的双向长短时记忆神经网络进行情感分类.结果表明,该文提出的方法在IEMOCAP四类情感分类中的加权准确率达到了76.49%.

外文标题：Multimodal speech emotion recognition based on text feature energy encoding

外文摘要：Energy is one of the important characteristics of emotional expression.Different words have energy values when speaking,reflecting other emotional states of the speaker.In the process of transcription of speech into text,the energy information expressed by each text is not included,which leads to the loss of energy information when the text features are extracted.Therefore,for the text mode,this paper proposes and designs an energy coding,which adds the energy value of each word and each pause of the speech signal to the transcribed text so that the text features contain energy information and obtain the discourse level text features through the DC-BERT model.OpenSMILE toolbox was used for speech modes to extract shal-low acoustic features in speech.Random forest(RF)algorithm was adopted to select 1000-dimensional features with the highest importance of emotional features as the new feature set.In-depth features are extracted from new feature sets through the Transformer Encoder network,and shallow elements and in-depth features are fused to form multi-level voice emotion features.Finally,Bi-directional long short term memory-attention(BiLSTM-ATT)neural network based on a self-attention mechanism is used to classify emotions.The results show that the weighted accuracy of the proposed method in the IEMOCAP classification reaches 76.49%.

外文关键词：

Multimodal emotion recognitionEnergy encodingRandom forestFeature fusionAttention mechanism

作者：

方丛丛、金赟、赵力、马勇、李世党、顾煜

展开 >

作者单位：

江苏师范大学物理与电子工程学院徐州 221116

江苏师范大学科文学院徐州 221116

东南大学信息科学与工程学院南京 210096

江苏师范大学语言科学与艺术学院徐州 221116

展开 >

关键词：

多模态情感识别能量编码随机森林特征融合注意机制

基金：

江苏省高校自然科学基金项目

项目编号：

18KJB510013

出版年：

2024

DOI：

10.11684/j.issn.1000-310X.2024.05.009

应用声学

中国科学院声学研究所

应用声学

CSTPCD北大核心

影响因子：1.128

ISSN：1000-310X

年,卷(期)：2024.43(5)