首页|基于幅值滤波与分层特征融合策略的语音情感识别

基于幅值滤波与分层特征融合策略的语音情感识别

Speech emotion recognition based on amplitude filtering and hierarchical feature fusion strategy

扫码查看
针对语音情感识别在多语言联合数据集上识别准确率低的问题,提出了一种基于幅值滤波与分层特征融合策略的语音情感识别方法.该方法首先对梅尔谱图内幅值分布规律进行幅值滤波,通过概率叠加扩大梅尔谱图内相近幅值之间的差异,实现谱图内的高频强增益、低频弱增益;同时,通过概率相乘缩小梅尔谱图内相远幅值之间的差异,以显示谱图内中频的细节部分.在此基础上,使用矩形卷积提取音频信号的时间动态特征,生成梅尔谱图动态特征图,并将其作为分层特征融合策略的输入.分层特征融合策略通过压缩特征图来提取不同尺度的时间动态特征,并提取不同深度中的时间动态特征.在多语言联合数据集CER上取得了84.44%的分类准确率.
A speech emotion recognition method based on amplitude filtering and hierarchical feature fusion strategy is proposed in response to the problem of low accuracy of speech emotion recognition on multi-language joint datasets.The method first applies amplitude filtering to the amplitude distribution pattern in the Mel spectrogram,enlarging the differences between similar amplitudes and achieving high frequency strong gain and low frequency weak gain within the spectrogram.Meanwhile,by multiplying probabilities,it reduces the differences between distant amplitudes in the Mel spectrogram,displaying the detailed middle frequency components.Based on this,the method uses rectangular convolution to extract the temporal dynamic features of the audio signal,generating dynamic feature maps of the Mel spectrogram,which serve as inputs to the hierarchical feature fusion strategy.The hierarchical feature fusion strategy compresses the feature maps to extract temporal dynamic features of different scales and from different depths.The proposed method achieves a classification accuracy of 84.44%on the multi-language joint dataset CER.

speech emotion recognitionamplitude filteringhierarchical feature fusion strategydynamic feature map of Mel spectrogram

喻永振、刘大明

展开 >

上海电力大学计算机科学与技术学院 上海 200090

语音情感识别 幅值滤波 分层特征融合策略 梅尔谱图动态特征图

上海市科技计划项目

23010501500

2024

国外电子测量技术
北京方略信息科技有限公司

国外电子测量技术

CSTPCD
影响因子:1.414
ISSN:1002-8978
年,卷(期):2024.43(3)
  • 28