首页|利用全局-局部特征依赖的反欺骗说话人验证系统

利用全局-局部特征依赖的反欺骗说话人验证系统

扫码查看
针对现有卷积模型为主的反欺骗说话人验证系统捕获全局特征依赖不理想的问题,提出一种利用全局-局部特征依赖的反欺骗说话人验证系统.首先,对于欺骗语音检测模块,设计两种滤波器组合方式对原始语音进行滤波,并通过对频率子带的掩蔽实现样本扩充;其次,提出多维全局注意力机制,通过对信道维度、频率维度和时间维度分别进行池化,获得每个维度的全局依赖关系,并将全局信息通过加权的方式与原始特征相融合;最后,在说话人验证部分引入统计金字塔池化时延神经网络(SPD-TDNN),在获取多尺度时频特征的同时计算特征的标准差,并加入全局信息.实验结果表明,与集成时频图卷积(AASIST)模型相比,在ASVspoof2019数据集上提出的欺骗语音检测系统将等错误率(EER)降低了65.4%;与单独的金字塔池化说话人验证系统相比,提出的反欺骗说话人验证系统将欺骗感知说话人验证等错误率降低了约97.8%.以上验证了所提两个模块借助全局特征依赖能实现更好的分类效果.
Speaker verification system utilizing global-local feature dependency for anti-spoofing
Aiming at the problem that the existing speaker verification systems for anti-spoofing,with convolutional model as main part,cannot capture global feature dependency well,an speaker verification system utilizing global-local feature dependency for anti-spoofing was proposed.Firstly,for the speech spoofing detection module,two filter combination ways were designed to filter the original speech,and sample augmentation was achieved by masking the frequency sub-bands.Secondly,a multi-dimensional global attention mechanism was proposed,where the global dependencies of each dimension were obtained by pooling the channel dimension,frequency dimension,and time dimension,respectively,and the global information was fused with the original features by weighting.Finally,for the speaker verification part,a Statistical Pyramid Dense Time Delay Neural Network(SPD-TDNN)was introduced to compute the standard deviation of the features and add the global information while obtaining the multi-scale time-frequency features.Experimental results show that on ASVspoof2019 dataset,the proposed speech spoofing detection system reduces the Equal Error Rate(EER)by 65.4%compared to Audio Anti-Spoofing using Integrated Spectro-Temporal graph attention network(AASIST)model,the proposed speaker verification system for anti-spoofing reduces the spoofing-aware speaker verification EER by 97.8%compared to the separate pyramid pooling speaker verification system.The above verifies that the proposed two modules achieve better classification results with the help of global feature dependency.

speaker verificationdata augmentationfrequency maskingattention mechanismspeech spoofing detection

张嘉琳、任庆桦、毛启容

展开 >

江苏大学 计算机科学与通信工程学院,江苏 镇江 212013

江苏省大数据泛在感知与智能农业应用工程研究中心(江苏大学),江苏 镇江 212013

说话人验证 数据增强 频率掩蔽 注意力机制 欺骗语音检测

2025

计算机应用
中国科学院成都计算机应用研究所

计算机应用

北大核心
影响因子:0.892
ISSN:1001-9081
年,卷(期):2025.45(1)