首页|基于ASGRU-CNN时空双通道的语音情感识别

基于ASGRU-CNN时空双通道的语音情感识别

扫码查看
语音情感识别是实现人机交互的关键,如何提升语音情感识别的准确率以及更有效地提取具有情感代表性的特征是语音情感识别所面临的问题之一.针对以上问题,构建了一种包含空间特征提取模块和时序特征提取模块的双通道时空语音情感识别模型ASGRU-CNN.模型总体框架由两条并行分支组成:第一分支为空间特征提取模块,由三维卷积、二维卷积及池化操作共同构成级联结构;第二分支为时序特征提取模块,由切片循环神经网络内嵌门控循环单元及注意力机制构成.模型以韵律特征及谱特征的融合特征作为输入特征,经过双分支处理后,进入全连接层进行语音情感分类.在CASIA与EMO-DB数据库上进行相关实验,并通过数据扩充增加训练样本,与其它语音情感识别模型实验结果相比,所提出的模型具有较好的鲁棒性和泛化性.
Speech Emotion Recognition Based on ASGRU-CNN Spatiotemporal Dual Channel
Speech emotion recognition is the key to achieving human-computer interaction,and how to improve the accuracy of speech emotion recognition is a major problem for speech emotion recognition.To realize this,a novel speech recognition model called ASGRU-CNN is proposed.The overall framework of the proposed model consists of two parallel branches:the first branch is the spatial feature extraction module consisted of 3D convolution,2D convo-lution,and pooling operations together to form a cascade structure;The second branch is the temporal feature extrac-tion module consisted of a slicing cycle and an attention mechanism.The model takes the fused features of rhythmic features and spectral features as the input,and enters the fully connected layer for speech emotion classification after the double branching process.The relevant experiments has been conducted on CASIA and EMO-DB databases and on their expanded version.Compared with the experimental results of other speech emotion recognition models,the proposed model has better robustness and generalizability.

Speech emotion recognitionFusion featuresSliced recurrent neural networksAttention mechanismData augmentation

高鹏淇、黄鹤鸣

展开 >

青海师范大学计算机学院,青海 西宁 810008

藏语智能信息处理及应用国家重点实验室,青海 西宁 810008

语音情感识别 融合特征 切片循环神经网络 注意力机制 数据扩充

国家自然科学基金青海省自然科学基金

6206600392022-ZJ-925

2024

计算机仿真
中国航天科工集团公司第十七研究所

计算机仿真

CSTPCD
影响因子:0.518
ISSN:1006-9348
年,卷(期):2024.41(4)
  • 18