首页|改进卷积神经网络的单词级语音活体检测方法

改进卷积神经网络的单词级语音活体检测方法

扫码查看
为提高智能家居语音验证系统中重放语音的检测精度,提出了一种新型的单词级语音活体检测方法,采用轻量型卷积全局门控循环神经网络(light convolutional global gate recurrent neural network,LC-GGRNN)作为深度特征提取器,由支持向量机(support vector machine,SVM)执行真实和重放语音的分类,即LC-GGRNN-SVM框架.LC-GGRNN 是在轻量型卷积神经网络的基础上引入了全局注意力机制和门控循环单元,前者关注提取特征的通道信息、空间信息以及通道与空间相互作用的信息,后者学习深度特征的长期相关性.提取POCO(pop noise corpus)数据集中音频文件的3种声学特征分别用于模型训练、验证和测试.结果表明,提取的伽马通频率倒谱系数声学特征在所提方法上检测效果最好,准确率、等错误率分别为85.72%、14.28%,错误接受率和错误拒绝率之和为28.59%,所提方法在POCO上的语音活体检测还具有性别依赖性.此外,所提方法对句子级重放语音检测也具有较好的泛化性.
Word-level voice liveness detection method based on improved convolutional neural network
In order to improve the detection accuracy of replay voice in the smart home voice verification system,a new word-level voice liveness detection method is proposed,that is,a light convolutional global gate recurrent neural network(LC-GGRNN)is used as a deep feature extractor,real and replay voice classification is performed by the support vector machine(SVM),that is framework of LC-GGRNN-SVM.In particular,a global attention mechanism and a gated recurrent unit are introduced into LC-GGRNN based on the light convolutional neural network.The former is to focus on the channel information,spatial information,and the interaction information between channel and space about extracted features,and the latter is to learn the long-term correlation of deep features.Three acoustic features extracted from audio files in the PO-CO(pop noise corpus)dataset are used for model training,validation,and testing.The results show that the extracted a-coustic features of Gammatone frequency cepstral coefficients have the best detection effect on the proposed method.The ac-curacy and equal error rates are 85.72%and 14.28%,respectively,and the sum of the false acceptance rate and the false rejection rate is 28.59%.It can also be proved that voice liveness detection of the proposed method on POCO is gender-de-pendent.In addition,the proposed method also has good generalization for sentence-level replay voice detection.

voice liveness detectionacoustic featurespop noiselight convolutional neural networksupport vector ma-chine(SVM)pop noise corpus(POCO)dataset

李志刚、宋晓婷、郭琪美、孙晓川

展开 >

华北理工大学人工智能学院,河北唐山 063210

河北省工业智能感知重点实验室,河北唐山 063210

语音活体检测 声学特征 气爆杂音 轻量型卷积神经网络 支持向量机(SVM) POCO数据集

河北省高等学校科学技术研究项目国家重点研发计划项目

ZD20210882017YFE0135700

2024

重庆邮电大学学报(自然科学版)
重庆邮电大学

重庆邮电大学学报(自然科学版)

CSTPCD北大核心
影响因子:0.66
ISSN:1673-825X
年,卷(期):2024.36(1)
  • 27