基于多组多分辨率特征和小波通道注意力的环境声音分类

扫码查看

原文链接

国家科技期刊平台
NETL
NSTL
万方数据

中文摘要：针对目前环境声音分类(Environmental Sound Classification,ESC)方法对音频特征提取中反映的时频维度信息不足的问题,提出基于多组多分辨率特征和小波通道注意力的分类方法.采用多组多分辨率特征组成的多特征作为网络输入,通过多组滤波器,多个频率分辨率,在时间和频率维度上实现数据增强,同时实现信息互补.为了更好地衡量各个通道的重要性,针对一维音频图像特征设计了小波通道注意力模块,采用离散小波变换(Discrete Wavelet Transform,DWT)将信号的低频子带和高频子带有效结合,得到通道标量,利用Gram-Schmidt正交化方法使网络在信道注意压缩阶段提取的信息多样化,利用长短期记忆(Long Short Term Memory,LSTM)网络长时间保存信息,提高学习的长期可靠性.实验结果表明,在ESC-10和ESC-50数据集上的分类准确度分别达到了 98.7％和93.6％,取得了较好的效果,为音频特征处理提供了一种新的研究思路.

外文标题：Environmental Sound Classification Based on Multiple Groups of Multi-resolution Features and Wavelet Channel Attention

外文摘要：For the problem of insufficient time-frequency dimension information reflected in audio feature extraction in current Environmental Sound Classification(ESC)methods,a classification method based on multiple groups of multi-resolution features and wavelet channel attention is proposed.Multiple groups of multi-resolution features are used as network inputs,and data augmentation is achieved in both time and frequency dimensions through multiple groups of filters and multiple frequency resolutions,while information complementarity is also achieved;In order to better measure the importance of each channel,a wavelet channel attention module is designed for one-dimensional audio image features.The Discrete Wavelet Transform(DWT)is used to effectively combine the low-frequency and high-frequency subbands of the signal to obtain channel scalars.The Gram-Schmidt orthogonalization method is used to diversify the information extracted by the network during the channel attention compression stage.The Long Short Term Memory(LSTM)network is utilized to store information for a long time and improve the long-term reliability of learning.The experimental results show that the classification accuracy of the ESC-10 and ESC-50 datasets reach 98.7％and 93.6％,respectively,achieving good results and providing a new research approach for audio feature processing.

外文关键词：

ESCmultiple groups of multi-resolution featureswavelet channel attentionLSTM network

作者：

李军、王子壬、向彦伯、钮焱

展开 >

作者单位：

湖北工业大学计算机学院,湖北武汉 430068

关键词：

环境声音分类多组多分辨率特征小波通道注意力长短期记忆网络

基金：

国基自然科学基金湖北省省级教研项目

项目编号：

619021162020454

出版年：

2024

DOI：

10.3969/j.issn.1003-3106.2024.08.004

无线电工程

中国电子科技集团公司第五十四研究所

无线电工程

影响因子：0.667

ISSN：1003-3106

年,卷(期)：2024.54(8)

参考文献量5