首页|融合Swin Transformer和CNN的环境声音分类模型

融合Swin Transformer和CNN的环境声音分类模型

扫码查看
环境声音分类已经成为计算机听觉领域的一项重要任务,可以作为计算机视觉的补充,帮助设备更好地理解环境和用户需求,具有广泛的应用前景,将对人类生活产生积极影响.近年来,环境声音分类领域采用了具有自注意力机制的Trans-former 模型,然而现有模型需要较大的内存,同时依赖于预训练的视觉模型,无法较好提取音频特征.为了解决这些问题并提高环境声音分类准确度,提出了一种新的具有双分支结构的Swin Conformer环境声音分类模型.通过融合卷积神经网络和具有窗口自注意力机制的Swin Transformer模型,以交互方式融合双分支特征并引入令牌语义模块.结果表明:Swin Conformer模型在ESC-50和UrbanSound8K公共数据集上分别通过验证实现了 98.1%和96.8%的分类准确度.与现有模型相比,具有更高的分类准确度,证明了该模型在环境声音分类任务中的可行性和优越性.
Environmental Sound Classification Model Combining Swin Transformer and CNN
Environmental sound classification has become an important task in the field of computer hearing,which can be used as a supplement to computer vision to help devices better understand the environment and user needs,and has a wide range of application prospects,which will have a positive impact on human life.In recent years,Transformer model with self-attention mechanism has been adopted in the field of environmental sound classification.However,the existing model requires large memory and relies on pre-trained visual model,and cannot extract audio features well.In order to solve these problems and improve the accuracy of environmental sound classification,a new Swin Conformer environmental sound classification model with double branch structure was proposed.By fusing convolutional neural network and Swin Transformer model with window self-attention mechanism,the two-branch features were interactively fused and the token semantic module was introduced.The results show that the Swin Conformer model achieves 98.1%and 96.8%classification accuracy on ESC-50 and UrbanSound8K public data sets,respectively.Compared with the existing model,it has higher classification accuracy,which proves the feasibility and superiority of this model in the task of environmental sound classification.

environmental sound classificationdata augmentationTransformerself-attention

朱振飞、葛动元、姚锡凡、苏瑞轩

展开 >

广西科技大学机械与汽车工程学院,柳州 545000

华南理工大学机械与汽车工程学院,广州 510000

环境声音分类 数据增强 Transformer 自注意力

2024

科学技术与工程
中国技术经济学会

科学技术与工程

CSTPCD北大核心
影响因子:0.338
ISSN:1671-1815
年,卷(期):2024.24(28)