Environmental Sound Recognition Based on Channel and Frame-level Feature Attention Model
In order to improve the recognition of environmental sounds,a convolutional neural network model for environmental sound recognitionwas proposed.This model was based on channel and frame-level feature attention.This model utilized one-dimension-al convolution to enhance the model's capacity to extract sound feature information,leveraging the specific characteristics of sound fea-tures.The SE-Res2Net module was introduced to enhance the global perception of sound features at a fine-grained level and assist the model focus on information among feature channels.An attention statistical pooling module was introduced before the fully connected layer to enhance the learning of key frame-level features representing different sound categories and improve the models recognition per-formance.Using the UrbanSound8K dataset,experimental results show that the proposed model achieves a training accuracy of 94.5%on the test set.It is indicated that the model can effectively learn key information representing various environmental sounds in sound features and make accurate predictions.Analysis of the ablation experiment results indicate that the proposed model's design can de-crease the classification error rate by 43.8%,demonstrating the effectiveness of applying one-dimensional convolution and introducing various modules.The performance of the proposed environmental sound recognition model is superior.