声景描述了生物多样性、人类活动和其他声音的空间和时间模式,反映了重要的人为和生态过程。声景分类不仅有助于提升不同声景成分计算分析的准确率,还有助于深入了解不同声音的特点和分布,从而为保护和改善生态环境提供依据。然而,被动声学设备采集的大量录音数据给声景数据的分析带来困难。为平衡采样数据量与采样成本之间的矛盾,有必要探索一种高效的录音策略,满足声景分类研究的需要。本研究以北京野鸭湖湿地公园的录音数据为研究对象,在不同录音策略下对比了7个声学指数(声学复杂度指数(acoustic complexity index,ACI)、声学多样性指数(acoustic diversity index,ADI)、声学均匀度指数(acoustic evenness index,AEI)、生物声学指数(bioacoustic index,BIO)、声熵指数(acoustic entropy index,H)、振幅包络线中值(median of the amplitude envelope,M)和标准化声景差异指数(normalized difference sound index,NDSI))和BYOL-A(bootstrap your own latent for audio)特征的表现,探索适合声景分类(生物声、地理声、人工声)的录音策略及声学特征。结果表明:(1)每小时均匀采集10个1 min的子样本可以较好地平衡数据量与成本之间的矛盾(Spearman相关系数ρ>0。9);(2)描述声景的多个声学指数中,ACI和H是最稳定的指标;(3)BYOL-A特征比声学指数能更有效地完成声景分类。合适的录音策略和高性能的深度学习特征——BYOL-A特征能够快速捕捉声景信息,有助于提高声景分类的准确率。本研究结果可为声景数据采集和声学特征选择提供参考依据。
Wetland soundscape recording scheme and feature selection for soundscape classification
Aims:Soundscape describes the spatial and temporal patterns of biodiversity,human activities,and other sounds,reflecting important anthropogenic and ecological processes.Soundscape classification not only helps improve the accuracy of calculating and analyzing different soundscape components,but also helps researchers gain a deeper understanding of the characteristics and distribution of different sounds,thus providing a basis for protecting and improving the ecological environment by offering a deeper understanding of species composition in an ecosystem.However,the large number of recordings collected by passive acoustic devices poses difficulties in analyzing soundscape data.This study aims to explore an efficient recording scheme that balances the amount of sampling data with the sampling cost to achieve the most productive outcome for soundscape classification research.Methods:This study takes the recording data of Yeyahu Wetland Park of Beijing as the research object,compares the performance of seven acoustic indices(acoustic complexity index(ACI),acoustic diversity index(ADI),acoustic evenness index(AEI),bioacoustic index(BIO),acoustic entropy index(H),median of the amplitude envelope(M),normalized difference sound index(NDSI))and BYOL-A(bootstrap your own latent for audio)features by different recording schemes,and explores appropriate recording schemes and acoustic features for soundscape classification(biophony,geophony,anthrophony).Results:(1)Uniformly collecting 10 1-min sub-samples per hour could effectively capture soundscape information and balances data volume and cost(Spearman correlation coefficient ρ>0.9).(2)Among the multiple acoustic indices,ACI and H were the most stable indices.(3)BYOL-A features were more effective in completing soundscape classification than acoustic indices.Conclusion:Appropriate recording scheme and high-performance deep learning features such as BYOL-A features can quickly capture soundscape information and help improve the accuracy of soundscape classification.This study is expected to provide a guideline for soundscape data collection and acoustic feature selection in future research.