首页|基于通道和帧级特征注意力模型的环境声音识别

基于通道和帧级特征注意力模型的环境声音识别

扫码查看
为了对环境声音进行更好的识别,提出基于通道和帧级特征注意力的环境声音识别卷积神经网络模型.该模型针对声音特征特点选取一维卷积以提高模型对声音特征信息的提取能力,并引入SE-Res2Net模块实现对声音特征细粒度上的全局感受并帮助模型关注特征通道间的信息,在全连接层前加入注意力统计池化模块,增强模型对表征不同声音类别的关键帧级特征的学习以提高模型识别性能.采用Urbansound8K数据集,实验结果表明:所提模型在测试集上的训练准确率达到94.5%,即模型可以有效学习声音特征中表征不同环境声音的关键信息并进行正确预测.对消融实验结果分析可得,所提模型的设计可使其分类错误率的下降率达到43.8%,表明模型对一维卷积的应用和各个模块的引入是有效的,可见所提环境声音识别模型性能优越.
Environmental Sound Recognition Based on Channel and Frame-level Feature Attention Model
In order to improve the recognition of environmental sounds,a convolutional neural network model for environmental sound recognitionwas proposed.This model was based on channel and frame-level feature attention.This model utilized one-dimension-al convolution to enhance the model's capacity to extract sound feature information,leveraging the specific characteristics of sound fea-tures.The SE-Res2Net module was introduced to enhance the global perception of sound features at a fine-grained level and assist the model focus on information among feature channels.An attention statistical pooling module was introduced before the fully connected layer to enhance the learning of key frame-level features representing different sound categories and improve the models recognition per-formance.Using the UrbanSound8K dataset,experimental results show that the proposed model achieves a training accuracy of 94.5%on the test set.It is indicated that the model can effectively learn key information representing various environmental sounds in sound features and make accurate predictions.Analysis of the ablation experiment results indicate that the proposed model's design can de-crease the classification error rate by 43.8%,demonstrating the effectiveness of applying one-dimensional convolution and introducing various modules.The performance of the proposed environmental sound recognition model is superior.

sound recognitionfine-grainedchannel weightingframe-level featuresattention statistics pooling

苏瑞轩、葛动元、姚锡凡

展开 >

广西科技大学机械与汽车工程学院,柳州54500

华南理工大学机械与汽车工程学院,广州51500

声音识别 细粒度 通道加权 帧级特征 注意力统计池化

国家自然科学基金

51765007

2024

科学技术与工程
中国技术经济学会

科学技术与工程

CSTPCD北大核心
影响因子:0.338
ISSN:1671-1815
年,卷(期):2024.24(16)