应用声学2024,Vol.43Issue(5) :949-955.DOI:10.11684/j.issn.1000-310X.2024.05.003

用于说话人识别的密集多分支时延神经网络

Dense multi-branch time delay neural network for speaker recognition

和椿皓 常铁原 潘立冬
应用声学2024,Vol.43Issue(5) :949-955.DOI:10.11684/j.issn.1000-310X.2024.05.003

用于说话人识别的密集多分支时延神经网络

Dense multi-branch time delay neural network for speaker recognition

和椿皓 1常铁原 1潘立冬1
扫码查看

作者信息

  • 1. 河北大学电子信息工程学院 保定 071000
  • 折叠

摘要

时延神经网络是较早应用于说话人识别领域的一类神经网络.为实现更好的识别性能,近年来一些改进工作围绕加深或拓宽其网络结构进行.在对密集连接卷积网络以及多分支网络结构进行研究的基础上,提出一种密集多分支时延神经网络,用以进一步提升小体积模型对说话人特征的提取能力.在使用密集连接实现特征重用的基础上,并行多分支结构能同时对同一输入在不同分辨率下进行特征提取.在VoxCeleb1测试集、VoxCeleb1-H、VoxCeleb1-E上进行测试表明,该网络能在模型参数量较小的前提下实现准确的说话人识别,以便应用在一些存储空间受限的本地说话人识别场景中.

Abstract

Time delay neural networks are a class of neural networks that have been applied in the field of speaker recognition for a long time.To achieve better recognition performance,some improvement works in recent years revolve around deepening or widening their network structures.Based on the study of densely connected convolutional networks and multi-branch network structures,a dense multi-branch time delay neural network is proposed to further improve the speaker feature extraction capability of small volume models.On the basis of feature reuse using dense connections,the parallel multi-branch structure enables simultaneous feature extraction on the same input at different resolutions.Tests on the VoxCeleb1 test set,VoxCeleb1-H,and VoxCeleb1-E show that the network can achieve accurate speaker recognition with a small number of model parameters for application in some local speaker recognition scenarios where storage space is limited.

关键词

说话人识别/时延神经网络/多分支神经网络/密集连接/深度学习

Key words

Speaker recognition/Time delay neural networks/Multi-branch neural networks/Dense connec-tivity/Deep learning

引用本文复制引用

出版年

2024
应用声学
中国科学院声学研究所

应用声学

CSTPCD北大核心
影响因子:1.128
ISSN:1000-310X
段落导航相关论文