复旦学报(自然科学版)2024,Vol.63Issue(1) :18-31.

普通话多模态情感语音数据库构建与评测

Construction and Evaluation of Mandarin Multimodal Emotional Speech Database

李良琦 张雪英 段淑斐 肖仲喆 贾海蓉 梁慧芝
复旦学报(自然科学版)2024,Vol.63Issue(1) :18-31.

普通话多模态情感语音数据库构建与评测

Construction and Evaluation of Mandarin Multimodal Emotional Speech Database

李良琦 1张雪英 1段淑斐 1肖仲喆 2贾海蓉 1梁慧芝3
扫码查看

作者信息

  • 1. 太原理工大学 电子信息与光学工程学院,山西 太原 030024
  • 2. 苏州大学 光电信息科学与工程学院,江苏 苏州 215006
  • 3. 纽卡斯尔大学 计算机学院,英国 纽卡斯尔 NE17RU
  • 折叠

摘要

本文设计并建立了一个包含发音运动学、声学、声门和面部微表情的多模态情感语音汉语普通话数据库,分别从语料设计、被试选择、录制细节和数据处理等环节进行了详细的描述,其中信号被标记为离散情感标签(中性、愉悦、高兴、冷漠、愤怒、忧伤、悲痛)和维度情感标签(愉悦度、激活度、优势度).本文对维度标注的数据进行统计学分析,验证标注的有效性,同时验证标注者的 SCL-90 量表数据并与 PAD标注数据结合后进行分析,探究标注中存在的离群现象与标注者心理状况之间的内在联系.为验证该数据库的语音质量和情感区分度,本文使用 SVM、CNN、DNN3 种基础模型计算了 7 种情感的识别率.结果显示,单独使用声学数据时 7 种情感的平均识别率达到了 82.56%;单独使用声门数据时平均识别率达到了 72.51%;单独使用运动学数据时平均识别率也达到了 55.67%.因此,该数据库具有较高的质量,能够作为语音分析研究的重要来源,尤其是多模态情感语音分析的任务.

Abstract

This paper designs and establishes a multimodal emotional speech Mandarin Chinese database including pronunciation kinematics,acoustics,glottis and facial micro-expressions,which is described in detail from the aspects of corpus design,participant selection,recording details and data processing,in which signals are marked as discrete emotional labels(neutral,pleasant,happy,apathetic,angry,sad,grief)and dimensional emotional labels(pleasure,activation,dominance).In this paper,the data labeled by dimension are statistically analyzed to verify the effectiveness of the annotation,and the outliers in the annotation are analyzed by combining the SCL-90 scale,and the SCL-90 scale data of the annotator is verified and analyzed in combination with the PAD annotated data,so as to explore the intrinsic relationship between the outlier phenomenon in the annotation and the psychological condition of the labeler.In order to verify the speech quality and emotion discrimination of the database,this paper uses three basic classification models of Support Vector Machine(SVM),Deep Neural Networks(DNN),Convolutional Neural Networks(CNN),to calculate the emotion recognition rate of these seven emotions categories.The results show that the average recognition rate of all seven emotions when using acoustic data alone reached 82.56%;the average recognition rate when using glottis data alone reached 72.51%;the average recognition rate when using the kinematics data also reached of 55.67%.Therefore,the database has high quality and can serve as an important source for the speech analysis research community,especially the task of multimodal emotional speech analysis.

关键词

情感语音数据库/多模态情感识别/维度情感空间/三维电磁发音仪/电子声门仪

Key words

emotional speech database/multimodal emotional recognition/dimensional emotional space/electromagnetic articulography/electroglottography

引用本文复制引用

基金项目

国家自然科学基金青年科学基金(12004275)

山西省应用基础研究计划面上自然基金(20210302123186)

山西省留学人员科技活动择优资助项目(20200017)

太原理工大学引进人才科研启动基金(tyutrc201405b)

出版年

2024
复旦学报(自然科学版)
复旦大学

复旦学报(自然科学版)

CSTPCD北大核心
影响因子:0.388
ISSN:0427-7104
参考文献量11
段落导航相关论文