Dense multi-branch time delay neural network for speaker recognition
Time delay neural networks are a class of neural networks that have been applied in the field of speaker recognition for a long time.To achieve better recognition performance,some improvement works in recent years revolve around deepening or widening their network structures.Based on the study of densely connected convolutional networks and multi-branch network structures,a dense multi-branch time delay neural network is proposed to further improve the speaker feature extraction capability of small volume models.On the basis of feature reuse using dense connections,the parallel multi-branch structure enables simultaneous feature extraction on the same input at different resolutions.Tests on the VoxCeleb1 test set,VoxCeleb1-H,and VoxCeleb1-E show that the network can achieve accurate speaker recognition with a small number of model parameters for application in some local speaker recognition scenarios where storage space is limited.