融合注意力机制轻量级网络的语声情感识别

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：在语声情感识别过程中,为解决缺乏方言数据库、识别模型准确率低等问题,建立辽西方言语声情感数据库,并提出一种融合注意力机制轻量级网络的语声情感识别模型.模型由特征组合网络、CBAM注意力机制、深度卷积网络及输出层四部分组成.利用3个大小不同的并行卷积提取浅层语声特征并进行拼接;引入CBAM注意力模块将空间特征与通道特征融合;融合后的特征输入深度卷积网络,提取语声深层次特征,输出多维特征向量;输出层对语声进行情感分类识别.模型在IEMOCAP、Emo-DB和自建辽西语声情感数据库上验证,分别取得82.5％、96.2％和90.8％的准确率.实验结果表明,与其他深度学习的模型相比,该文提出的模型在参数量更少的同时识别率更高.

外文标题：Speech emotion recognition with lightweight networks incorporating attention mechanisms

外文摘要：In the process of speech emotion recognition,to solve the problems of lack of dialect database and low accuracy of recognition model,a speech emotion database of Liaoxi dialect was established,and a speech emotion recognition model integrating attention mechanism lightweight network was proposed.The model consists of four parts:feature combination network,CBAM attention mechanism,deep convolutional network,and output layer.Three parallel convolutions with different sizes are used to extract the shallow speech features and concatenate them.The CBAM attention module is introduced to refine the input features.The fused feature input deep convolutional network extracts the deep feature of speech and outputs the multi-dimensional feature vector;The output layer classifies and recognizes speech emotion.The model was verified on IEMOCAP,Emo-DB,and Liaoxi dialect speech emotion database,and the accuracy rates were 82.5％,96.2％,and 90.8％,respectively.Experimental results show that compared with other deep learning models,the proposed model has fewer parameters and a higher recognition rate.

外文关键词：

Speech emotion recognitionLiaoxi dialectDeep learningLightweight

作者：

冀常鹏、佟婷婷、代巍

展开 >

作者单位：

辽宁工程技术大学电子与信息工程学院葫芦岛 125105

关键词：

语声情感识别辽西方言深度学习轻量级

基金：

辽宁省科技厅项目

项目编号：

2019-ZD-0038

出版年：

2024

DOI：

10.11684/j.issn.1000-310X.2024.04.022

应用声学

中国科学院声学研究所

应用声学

CSTPCD北大核心

影响因子：1.128

ISSN：1000-310X

年,卷(期)：2024.43(4)

参考文献量2