基于幅值滤波与分层特征融合策略的语音情感识别
Speech emotion recognition based on amplitude filtering and hierarchical feature fusion strategy
喻永振 1刘大明1
作者信息
- 1. 上海电力大学计算机科学与技术学院 上海 200090
- 折叠
摘要
针对语音情感识别在多语言联合数据集上识别准确率低的问题,提出了一种基于幅值滤波与分层特征融合策略的语音情感识别方法.该方法首先对梅尔谱图内幅值分布规律进行幅值滤波,通过概率叠加扩大梅尔谱图内相近幅值之间的差异,实现谱图内的高频强增益、低频弱增益;同时,通过概率相乘缩小梅尔谱图内相远幅值之间的差异,以显示谱图内中频的细节部分.在此基础上,使用矩形卷积提取音频信号的时间动态特征,生成梅尔谱图动态特征图,并将其作为分层特征融合策略的输入.分层特征融合策略通过压缩特征图来提取不同尺度的时间动态特征,并提取不同深度中的时间动态特征.在多语言联合数据集CER上取得了84.44%的分类准确率.
Abstract
A speech emotion recognition method based on amplitude filtering and hierarchical feature fusion strategy is proposed in response to the problem of low accuracy of speech emotion recognition on multi-language joint datasets.The method first applies amplitude filtering to the amplitude distribution pattern in the Mel spectrogram,enlarging the differences between similar amplitudes and achieving high frequency strong gain and low frequency weak gain within the spectrogram.Meanwhile,by multiplying probabilities,it reduces the differences between distant amplitudes in the Mel spectrogram,displaying the detailed middle frequency components.Based on this,the method uses rectangular convolution to extract the temporal dynamic features of the audio signal,generating dynamic feature maps of the Mel spectrogram,which serve as inputs to the hierarchical feature fusion strategy.The hierarchical feature fusion strategy compresses the feature maps to extract temporal dynamic features of different scales and from different depths.The proposed method achieves a classification accuracy of 84.44%on the multi-language joint dataset CER.
关键词
语音情感识别/幅值滤波/分层特征融合策略/梅尔谱图动态特征图Key words
speech emotion recognition/amplitude filtering/hierarchical feature fusion strategy/dynamic feature map of Mel spectrogram引用本文复制引用
基金项目
上海市科技计划项目(23010501500)
出版年
2024