首页|基于迁移学习和基频特征融合的文本相关说话人识别框架

基于迁移学习和基频特征融合的文本相关说话人识别框架

扫码查看
目前,面向我国金融支付的说话人识别技术在社会层面上没有大范围的推广,其原因在于数据集的缺乏以及识别技术未能满足安全性要求.针对上述问题,文章录制了用于中文数字串文本相关说话人识别的SHALCAS-WXSD22B数据集,用于金融支付场景中的数字串声纹识别研究,并提出一种基于迁移学习和基频特征融合的文本相关说话人识别框架,提高了文本相关说话人识别技术的可靠性.在数字串SHALCAS-WXSD22B-d006和SHALCAS-WXSD22B-d007语料实验中,所提框架实现的最佳等错误率分别为0.88%和1.05%,与ECAPA-TDNN基线模型相比等错误率相对降低了17和20个百分点,且达到了支付场景下的声纹识别安全性指标.实验结果表明,文中所提框架不仅具有更好的识别准确率和安全性能,而且同样能提高框架中包括ResNet34在内的其他log-Mel识别模型的性能.
The text-dependent speaker verification framework based on transfer learning and feature fusion
The speaker verification technique for financial payments in China is not widely promoted at the societal level due to lack of datasets and the security of the models.In this paper,a text-related speaker verification framework based on transfer learning and fundamental frequency feature fusion is proposed to address the above problems on the self-recorded SHALCAS-WXSD22B dataset.In the digital string SHALCAS-WXSD22B-d006 and SHALCAS-WXSD22B-d007 corpus experiments,the best equal error rates achieved by the proposed framework implementation are 0.88%and 1.05%.Compared with the ECAPA-TDNN baseline model,this method can reduce the equal error rates by 17%and 20%respectively and achieves security indicators in the field of financial payments.The experimental results show that the proposed method not only has better recognition accuracy and higher security performance compared to baseline methods,but also can be applied to other log-Mel models including ResNet34.

text-independent speaker verificationtransfer learningembedding-level fusiondecision-level fusion

马皓天、洪峰、毛海全、徐楚林、胡梦璐、牟宏宇、陈友元、许伟杰

展开 >

中国科学院声学研究所东海研究站,上海 201815

中国科学院大学,北京 100190

文本相关说话人识别 迁移学习 基频特征 嵌入级融合 决策级融合

中国科学院声学研究所自主部署"前沿探索"项目中国科学院青年创新促进会项目上海市自然科学基金项目

QYTS202114202102222ZR1475700

2024

声学技术
中科院声学所东海研究站,同济大学声学所,上海市声学学会,上海船舶电子设备研究所

声学技术

CSTPCD北大核心
影响因子:0.415
ISSN:1000-3630
年,卷(期):2024.43(5)
  • 4