Speech Emotion Recognition with Complementary Feature Learning Framework and Attentional Feature Fusion Module
Addressing the limitations of deep learning feature extraction methods,which fail to comprehensively extract and effectively integrate emotional features from speech,this paper proposes a novel speech emotion recog-nition model.It integrates a complementary feature learning framework and an attention feature fusion module.The complementary feature learning framework consists of two independent representational extraction branches and an interactive complementary representational extraction branch,thoroughly covering both independent and complementary representations of emotional features.To further optimize model performance,an attention fea-ture fusion module is introduced.This module allocates appropriate weights based on the contribution level of different representations to emotion classification,enabling the model to focus maximally on features most bene-ficial for emotion recognition.Simulation experiments conducted on two public emotion databases(Emo-DB and IEMOCAP)validate the robustness and effectiveness of the proposed model.