首页|基于密集连接时延神经网络的说话人识别算法

基于密集连接时延神经网络的说话人识别算法

扫码查看
说话人识别技术是一项重要的生物特征识别技术.近年来,使用时延神经网络提取发声特征的说话人识别算法取得了突出成果.为进一步增强时延神经网络对说话人特征的提取能力,在不过多消耗计算资源的前提下提升识别准确率,通过对现有的说话人识别算法进行研究,提出一种带有注意力机制的密集连接时延神经网络用于说话人识别.密集连接的网络结构在增强不同网络层之间的信息复用的同时能有效控制模型体积.通道注意力机制和帧注意力机制帮助网络聚焦于更关键的细节特征,使得通过统计池化提取出的说话人特征更具有代表性.实验结果表明,在VoxCeleb1测试数据集上取得了 1.40%的等错误率和0.15的最小检测代价标准,证明了在说话人识别任务上的有效性.
A speaker recognition algorithm based on densely connected time delay neural network
Speaker recognition is an important biometric identification technology.In recent years,speaker recognition algorithms that use time delay neural network to extract vocal features have achieved outstanding results.To further enhance the ability of time delay neural network to extract speaker features and improve the recognition accuracy without consuming too much computational resources,a densely connected time delay neural network with an attention mechanism is proposed for speaker recognition by investigating existing speaker recognition algorithms.The densely connected structure enhances the information reuse between different network layers while effectively controlling the model size.The channel attention mechanism and frame attention mechanism help the network to focus on more critical details of the features,making the speaker features extracted by statistical pooling more representative.Experimental results show that an equal error rate(EER)of 1.40%and a minimum detection cost criterion(MinDCF)of 0.15 were achieved on the VoxCeleb1 test dataset,demonstrating effectiveness on the speaker recognition task.

Speaker recognitionDeep learningNeural networkDense connectivityAttention mechanism

和椿皓、常铁原、潘立冬、王珺

展开 >

河北大学电子信息工程学院 保定 071000

说话人识别 深度学习 神经网络 密集连接 注意力机制

河北省自然科学基金

F2022201013

2024

应用声学
中国科学院声学研究所

应用声学

CSTPCD北大核心
影响因子:1.128
ISSN:1000-310X
年,卷(期):2024.43(2)
  • 28