首页|基于ERes-ECAM的动物声纹识别

基于ERes-ECAM的动物声纹识别

扫码查看
声纹识别技术不仅在人类身份验证领域广泛应用,在动物种类识别方面也取得一定进展。现有模型存在特征表达能力不足的问题,同时,在保证性能的前提下,模型的时间复杂度和推理速度有待优化。提出用于发声动物嵌入学习的改进的残差块连接改进的上下文感知掩蔽(Enhanced Res2block connected Enhanced Context Aware Masking,ERes-ECAM)新型架构,采用了稠密连接的时延神经网络(Densely-connected Time Delay Neural Network,D-TDNN)作为骨干,为了解决模糊不相关噪声问题的同时能够提取更多有效的关键信息,在D-TDNN层中采用多粒度池化方法的改进的上下文感知掩蔽(Enhanced Context Aware Masking,ECAM)模块,前端连接残差模块,通过局部特征融合(Local Feature Fusion,LFF)的方式,将残差块内提取的特征进行融合来提取局部信息,提升了声纹验证系统的准确性和鲁棒性。在Anim-Celeb和Pig-Celeb两个测试集中分别实验,实验结果表明,所提架构的等错误率(Equal Error Rate,EER)分别达到6。88%和7。24%,同时,对动物种类和猪只种类识别准确率达到了 93。12%和92。76%。
Animal Voiceprint Recognition Based on ERes-ECAM
Voiceprint recognition technology is not only widely used in the field of human identity verification,but also has made some progress in animal species recognition.Existing models suffer from insufficient feature expression ability,while the time complexi-ty and inference speed of the models need to be optimized under the premise of guaranteeing performance.In this paper,we proposed a novel architecture of Enhanced Res2block connected Enhanced Context Aware Masking(ERes-ECAM)for vocal animal embedding learning,which adopts Densely-connected Time Delay Neural Network(D-TDNN)as the backbone,and in order to solve the problem of fuzzy irrelevant noise while being able to extract more effective key information,an Enhanced Context Aware Masking(ECAM)mod-ule with a multi-granularity pooling method is used in the D-TDNN layer,and the front-end is connected to a residual module,and the features extracted within the residual block are fused to extract local information by means of Local Feature Fusion(LFF),which im-proves the accuracy and robustness of the voiceprint verification system.As described in this paper,experiments were conducted in two test sets,Anim-Celeb and Pig-Celeb,and experimental results showed that the Equal Error Rate(EER)of the proposed architecture reached 6.88%and 7.24%,respectively,and at the same time,the accuracies of recognizing the animal species and the pig species reached 93.12%and 92.76%.

deep learningvoiceprint recognitioncontext aware maskingLFFanimal species recognition

侯卫民、孙艺菲、刘峻滔

展开 >

河北科技大学信息科学与工程学院,河北石家庄 050018

深度学习 声纹识别 上下文感知掩码 局部特征融合 动物种类识别

2024

无线电通信技术
中国电子科技集团公司第五十四研究所

无线电通信技术

北大核心
影响因子:0.745
ISSN:1003-3114
年,卷(期):2024.50(4)