改进卷积神经网络的单词级语音活体检测方法

Word-level voice liveness detection method based on improved convolutional neural network

李志刚 ¹宋晓婷 ¹郭琪美 ¹孙晓川¹

扫码查看

作者信息

1. 华北理工大学人工智能学院,河北唐山 063210;河北省工业智能感知重点实验室,河北唐山 063210
折叠

摘要

为提高智能家居语音验证系统中重放语音的检测精度,提出了一种新型的单词级语音活体检测方法,采用轻量型卷积全局门控循环神经网络(light convolutional global gate recurrent neural network,LC-GGRNN)作为深度特征提取器,由支持向量机(support vector machine,SVM)执行真实和重放语音的分类,即LC-GGRNN-SVM框架.LC-GGRNN 是在轻量型卷积神经网络的基础上引入了全局注意力机制和门控循环单元,前者关注提取特征的通道信息、空间信息以及通道与空间相互作用的信息,后者学习深度特征的长期相关性.提取POCO(pop noise corpus)数据集中音频文件的3种声学特征分别用于模型训练、验证和测试.结果表明,提取的伽马通频率倒谱系数声学特征在所提方法上检测效果最好,准确率、等错误率分别为85.72％、14.28％,错误接受率和错误拒绝率之和为28.59％,所提方法在POCO上的语音活体检测还具有性别依赖性.此外,所提方法对句子级重放语音检测也具有较好的泛化性.

Abstract

In order to improve the detection accuracy of replay voice in the smart home voice verification system,a new word-level voice liveness detection method is proposed,that is,a light convolutional global gate recurrent neural network(LC-GGRNN)is used as a deep feature extractor,real and replay voice classification is performed by the support vector machine(SVM),that is framework of LC-GGRNN-SVM.In particular,a global attention mechanism and a gated recurrent unit are introduced into LC-GGRNN based on the light convolutional neural network.The former is to focus on the channel information,spatial information,and the interaction information between channel and space about extracted features,and the latter is to learn the long-term correlation of deep features.Three acoustic features extracted from audio files in the PO-CO(pop noise corpus)dataset are used for model training,validation,and testing.The results show that the extracted a-coustic features of Gammatone frequency cepstral coefficients have the best detection effect on the proposed method.The ac-curacy and equal error rates are 85.72％and 14.28％,respectively,and the sum of the false acceptance rate and the false rejection rate is 28.59％.It can also be proved that voice liveness detection of the proposed method on POCO is gender-de-pendent.In addition,the proposed method also has good generalization for sentence-level replay voice detection.

关键词

语音活体检测/声学特征/气爆杂音/轻量型卷积神经网络/支持向量机(SVM)/POCO数据集

Key words

voice liveness detection/acoustic features/pop noise/light convolutional neural network/support vector ma-chine(SVM)/pop noise corpus(POCO)dataset

引用本文复制引用

基金项目

河北省高等学校科学技术研究项目(ZD2021088)

国家重点研发计划项目(2017YFE0135700)

出版年

2024

重庆邮电大学学报(自然科学版)

重庆邮电大学

重庆邮电大学学报(自然科学版)

CSTPCD北大核心

影响因子：0.66

ISSN：1673-825X

参考文献量27

段落导航