基于聚类中心的浅层特征融合伪造语音检测

Spoofing speech detection based on shallow features fusion of clustering center

吴敦志 ¹陈为真¹

扫码查看

作者信息

1. 武汉轻工大学电气与电子工程学院,湖北武汉 430048
折叠

摘要

针对现有检测系统在使用wav2vec2.0模型提取特征导致高计算资源消耗和传统打分方法限制泛化性能的问题,提出一种基于聚类中心的浅层特征融合伪造语音检测算法.裁剪wav2vec2.0模型的深层,将浅层特征通过注意力池化以缩短时序长度,用线性层确定融合权重;通过K-means++得到聚类中心,利用当前样本和相应类中心的表示余弦相似度进行训练和打分以判别真伪.实验采用ASVspoof2019和ASVspoof2021挑战赛的逻辑轨道数据集,wav2vec2.0模型参数量减少了 60％,等错误率分别达到0.34％和3.67％,在模型精简和泛化性能方面明显优于同类wav2vec2.0模型和传统打分方法.

Abstract

Aiming at the problems of high computing resource consumption caused by wav2vec2.0 model in existing detection sys-tems and the limited generalization performance of traditional scoring methods,a spoofing detection algorithm based on shallow feature fusion of clustering center was proposed to solve the above problems.The deep layer of wav2vec2.0 model was trimmed,and the shallow features were pooled by attention mechanism to shorten the time series length.The linear layer was used to determine the fusion weight.The clustering centers were obtained by K-means++,and the representation cosine similarity between the current sample and the corresponding class center was adapted for training and scoring to distinguish bona-fide and spoofing speech.Results of experiment on the datasets of ASVspoof2019 and ASVspoof2021 challenges of the logical track show that the scale of wav2vec2.0 model parameter is reduced by 60％,and the equal error rate reaches 0.34％and 3.67％.It is sig-nificantly better than the similar wav2vec2.0 frond-end model and the traditional scoring method of classifiers in terms of model simplification and generalization performance.

关键词

伪造语音检测/模型压缩/预训练模型/注意力池化/特征融合/聚类中心/余弦相似度

Key words

spoofing speech detection/model compression/pre-trained model/attention pooling/feature fusion/clustering cen-ter/cosine similarity

引用本文复制引用

基金项目

湖北省教育厅科学研究基金项目(B2020061)

湖北省自然科学基金项目(2022CFB449)

出版年

2024

计算机工程与设计

中国航天科工集团二院706所

计算机工程与设计

CSTPCD北大核心

影响因子：0.617

ISSN：1000-7024

段落导航