Speech and Expression Multi-Modal Emotion Recognition Method Using Weighted Fusion
Current multi-modal fusion methods do not fully use the complementarity between speech and expres-sion modalities,which results in low recognition rates for multi-modal emotion recognition.Thus,to solve this prob-lem,this paper proposes a speech and expression multi-modal emotion recognition method based on weighted fusion.The method first uses the voice activation detection(VAD)algorithm to extract speech keyframes.Then,the informa-tion entropy is used to model that the generation of emotion is a continuous process,and the expression key frames are extracted.In addition,in order to fully use the complementarity between speech and expression modalities,the speech and expression key frame alignment techniques are utilized to calculate speech and expression weights.These weights are input into the feature fusion layer for weighted fusion,which effectively improves the recognition rate of speech and expression multi-modal emotion recognition.Finally,the experimental results on the RML,eNTERFACE05 and BAUM-1s datasets show that the recognition rate of this method is higher than other benchmark methods.