首页|基于特征融合与注意力机制的鸟类声纹识别方法

基于特征融合与注意力机制的鸟类声纹识别方法

扫码查看
鸟类声纹识别技术是一种将经过预处理的多种鸟类声音作为输入,通过网络模型识别出相应鸟类的技术.针对真实环境下鸟类声纹识别中单一音频特征局限和模型学习特征能力不佳问题,文章提出了一种基于特征融合和注意力机制的鸟类声纹识别方法.首先,在特征提取时分别获取梅尔频率倒谱系数和功率正则化倒谱系数,其次利用均值和方差归一化处理将两种特征融合得到新型融合特征参数MPFC;然后,以ResNet-50为主干网络在其残差模块中引入轻量化坐标注意力机制得到改进网络模型—坐标注意力残差网络;最后,将融合特征分别输入到坐标注意力残差网络(residual coordinate attention net,ResCA),ResNet-50、ResNeSt-50、DenseNet-121和EfficientNet-B0并在两个数据集Birdsdata和BirdCLEF上进行对比实验.实验结果表明,融合特征比单一特征有更好的表征能力,能够提高一定识别率,改进网络也具有较好的识别效果.
Bird call recognition based on feature fusion and attention mechanism
Bird call recognition technology is a kind of technology that uses a variety of bird sounds as input after preprocessing,and identifies the corresponding bird species through the network model.In real natural environment,the single audio feature in bird call recognition has a limitation that the characteristics of bird calls cannot be fully described from preprocessing and the learning ability of the network model is poor.In this paper,a bird call recognition method based on feature fusion and attention mechanism is presented.First,Mel frequency cepstrum coefficients and power-normalized cepstral coefficients are obtained during feature extraction in the bird calls preprocessing stage.Secondly,the two features are fused by using the mean and variance normalization processing to obtain a new fusion feature called MPFC.Then,ResNet-50 is used as the backbone network,and by inserting coordinate attentionm mechanism into its residual module to improve the network model,an improved attention residual network model called ResCA can be obtained.Finally,the fusion features are respectively input to the ResCA,ResNet-50,ResNeSt-50,DenseNet-121 and EfficientNet-B0 for comparison in the two datasets Birdsdata and BirdCLEF.The results show that the fusion feature has better characterization ability than the single feature,and can improve the recognition rate.The improved network also has a better recognition effect.

bird call recognitionfeature fusionMel frequency cepstrum coefficientpower-normalized cepstral coefficient

潘齐炜、程吉祥、田甜、吴丹、曾蕊

展开 >

西南石油大学电气信息学院,四川成都 610500

鸟类声纹识别 特征融合 梅尔频率倒谱系数 功率正则化倒谱系

国家自然科学基金国家自然科学基金西南石油大学智能控制与图像处理青年科技创新培育团队

61603319616013852017CXTD010

2024

声学技术
中科院声学所东海研究站,同济大学声学所,上海市声学学会,上海船舶电子设备研究所

声学技术

CSTPCD北大核心
影响因子:0.415
ISSN:1000-3630
年,卷(期):2024.43(5)
  • 2