基于迁移学习和基频特征融合的文本相关说话人识别框架

扫码查看

原文链接

万方数据
维普

中文摘要：目前,面向我国金融支付的说话人识别技术在社会层面上没有大范围的推广,其原因在于数据集的缺乏以及识别技术未能满足安全性要求.针对上述问题,文章录制了用于中文数字串文本相关说话人识别的SHALCAS-WXSD22B数据集,用于金融支付场景中的数字串声纹识别研究,并提出一种基于迁移学习和基频特征融合的文本相关说话人识别框架,提高了文本相关说话人识别技术的可靠性.在数字串SHALCAS-WXSD22B-d006和SHALCAS-WXSD22B-d007语料实验中,所提框架实现的最佳等错误率分别为0.88%和1.05%,与ECAPA-TDNN基线模型相比等错误率相对降低了17和20个百分点,且达到了支付场景下的声纹识别安全性指标.实验结果表明,文中所提框架不仅具有更好的识别准确率和安全性能,而且同样能提高框架中包括ResNet34在内的其他log-Mel识别模型的性能.

外文标题：The text-dependent speaker verification framework based on transfer learning and feature fusion

外文摘要：The speaker verification technique for financial payments in China is not widely promoted at the societal level due to lack of datasets and the security of the models.In this paper,a text-related speaker verification framework based on transfer learning and fundamental frequency feature fusion is proposed to address the above problems on the self-recorded SHALCAS-WXSD22B dataset.In the digital string SHALCAS-WXSD22B-d006 and SHALCAS-WXSD22B-d007 corpus experiments,the best equal error rates achieved by the proposed framework implementation are 0.88%and 1.05%.Compared with the ECAPA-TDNN baseline model,this method can reduce the equal error rates by 17%and 20%respectively and achieves security indicators in the field of financial payments.The experimental results show that the proposed method not only has better recognition accuracy and higher security performance compared to baseline methods,but also can be applied to other log-Mel models including ResNet34.

外文关键词：

text-independent speaker verificationtransfer learningembedding-level fusiondecision-level fusion

作者：

马皓天、洪峰、毛海全、徐楚林、胡梦璐、牟宏宇、陈友元、许伟杰

展开 >

作者单位：

中国科学院声学研究所东海研究站,上海 201815

中国科学院大学,北京 100190

关键词：

文本相关说话人识别迁移学习基频特征嵌入级融合决策级融合

基金：

中国科学院声学研究所自主部署"前沿探索"项目中国科学院青年创新促进会项目上海市自然科学基金项目

项目编号：

QYTS202114202102222ZR1475700

出版年：

2024

DOI：

10.16300/j.cnki.1000-3630.2024.05.010

声学技术

中科院声学所东海研究站，同济大学声学所，上海市声学学会，上海船舶电子设备研究所

声学技术

CSTPCD北大核心

影响因子：0.415

ISSN：1000-3630

年,卷(期)：2024.43(5)

参考文献量4