A study on emotion recognition based on real scenes
Emotion recognition research faces many problems when advancing from laboratory environments to unconstrained real-world scenarios.Unrestricted individual activities and complex environments in real-life scenarios make it impossible to reliably obtain single-modal data such as facial images and speech,and people's spontaneous emotions in real-life scenarios are much more subtle and expressive in less intensity,leading to increased recognition difficulty.Therefore,in order to recognize individual emo-tions in real scenes more robustly,a feature extraction network is designed to fully mine the emotion information in multimodal data such as face,skeleton,posture and scene to complement each other for the characteristics of individual activities;at the same time,it pays attention to the connection between different data and designs a feature fusion module to merge a variety of features.The net-work achieves the best recognition performance on the challenging PLPS-E dataset of real scenes in public space,with VAD dimen-sion emotion recognition accuracies of 74.62%,79.15%,and 87.94%;the network also achieves comparable performance on the relatively simple FABE dataset of real scenes,with dimension V recognition accuracy of 98.387%.The experiments show the effec-tiveness of the proposed algorithm.