首页|基于桥接Transformer的小样本优化鸟声识别网络

基于桥接Transformer的小样本优化鸟声识别网络

扫码查看
针对实际鸟类监测环境中,收集鸟声声频数据分布不均匀,导致神经网络训练不充分,分类识别测试准确率低的问题,设计了一种桥接Transformer神经网络模型.该网络首先利用原始鸟声声频信号生成短时傅里叶变换语谱图作为输入特征,之后将语谱图输入到由注意力模块和卷积模块桥接组成的Transformer网络中,完成对语谱图中全局特征和局部特征的信息交互,最后利用单层Transformer编码器实现对每一个批次样本的损失优化,得到最终的分类结果.在Birdsdata和xeno-canto鸟声数据集上进行小样本实验,分别获得了91.34%和82.63%的平均准确率,与其他鸟声识别网络进行了对比实验,验证了该网络的有效性.
Small sample optimized bird sound recognition network based on bridging transformer
In view of the uneven distribution of bird sound audio data collected by actual bird monitoring,the neural network training is not sufficient,and the classification recognition test accuracy is low,a bridging Transformer neural network model is designed.The network first uses the original birdsong audio signal to generate the short-time Fourier transform spectrogram as the input feature,and then inputs the spectrogram into the Transformer network composed of the attention module and the convolution module to complete the information interaction of the global and local features in the spectrogram.Finally,the single-layer Transformer encoder is used to optimize the loss in each batch of samples to obtain the final classification result.Small sample experiments were carried out on Birdsdata and xeno-canto bird sound datasets,and the average accuracy rates of 91.34%and 82.63%were obtained,respectively.Comparative experiments were carried out with other bird sound recognition networks to verify the effectiveness of the network.

Bird sound recognitionAttention mechanismConvolution moduleTransformer network

王基豪、周晓彦、韩智超、王丽丽

展开 >

南京信息工程大学电子与信息工程学院 南京 211800

鸟声识别 注意力机制 卷积模块 Transformer网络

2024

应用声学
中国科学院声学研究所

应用声学

CSTPCD北大核心
影响因子:1.128
ISSN:1000-310X
年,卷(期):2024.43(3)
  • 23