Research and application of multimodality in emotion recognition
In order to solve various noise interference such as typos,grammatical errors,and special words of network culture,this paper studies the emotional recognition method of multi-modal fusion,and proposes an emotional recognition network model based on modal fusion.Firstly,three modal features are extracted to unify and align the formats between multimodal data.And then,in order to mine the relationship between the modalities,the features of the three modalities of text,audio and video are fused,and thereby,the noise interference problem is solved according to the complementary information between the extracted fusion features.On this basis,the attention mechanism and the bidirectional recurrent neural network are used to further fully capture the fusion features and the context information in different emotional discourses,obtaining a richer fusion feature representation.Finally,the downstream task module is built,using rich fusion feature representation to improve the recognition effect of downstream task emotion recognition.Experiments have been carried out on three datasets using the network model proposed in this paper.The experimental results show that the multi-modal effect is better than the single-modal effect,and the emotion recognition network based on modal fusion has better performance in recognition performance.