Construction and Evaluation of Mandarin Multimodal Emotional Speech Database
This paper designs and establishes a multimodal emotional speech Mandarin Chinese database including pronunciation kinematics,acoustics,glottis and facial micro-expressions,which is described in detail from the aspects of corpus design,participant selection,recording details and data processing,in which signals are marked as discrete emotional labels(neutral,pleasant,happy,apathetic,angry,sad,grief)and dimensional emotional labels(pleasure,activation,dominance).In this paper,the data labeled by dimension are statistically analyzed to verify the effectiveness of the annotation,and the outliers in the annotation are analyzed by combining the SCL-90 scale,and the SCL-90 scale data of the annotator is verified and analyzed in combination with the PAD annotated data,so as to explore the intrinsic relationship between the outlier phenomenon in the annotation and the psychological condition of the labeler.In order to verify the speech quality and emotion discrimination of the database,this paper uses three basic classification models of Support Vector Machine(SVM),Deep Neural Networks(DNN),Convolutional Neural Networks(CNN),to calculate the emotion recognition rate of these seven emotions categories.The results show that the average recognition rate of all seven emotions when using acoustic data alone reached 82.56%;the average recognition rate when using glottis data alone reached 72.51%;the average recognition rate when using the kinematics data also reached of 55.67%.Therefore,the database has high quality and can serve as an important source for the speech analysis research community,especially the task of multimodal emotional speech analysis.