基于密集连接时延神经网络的说话人识别算法

扫码查看

原文链接

万方数据
维普

中文摘要：说话人识别技术是一项重要的生物特征识别技术.近年来,使用时延神经网络提取发声特征的说话人识别算法取得了突出成果.为进一步增强时延神经网络对说话人特征的提取能力,在不过多消耗计算资源的前提下提升识别准确率,通过对现有的说话人识别算法进行研究,提出一种带有注意力机制的密集连接时延神经网络用于说话人识别.密集连接的网络结构在增强不同网络层之间的信息复用的同时能有效控制模型体积.通道注意力机制和帧注意力机制帮助网络聚焦于更关键的细节特征,使得通过统计池化提取出的说话人特征更具有代表性.实验结果表明,在VoxCeleb1测试数据集上取得了 1.40％的等错误率和0.15的最小检测代价标准,证明了在说话人识别任务上的有效性.

外文标题：A speaker recognition algorithm based on densely connected time delay neural network

外文摘要：Speaker recognition is an important biometric identification technology.In recent years,speaker recognition algorithms that use time delay neural network to extract vocal features have achieved outstanding results.To further enhance the ability of time delay neural network to extract speaker features and improve the recognition accuracy without consuming too much computational resources,a densely connected time delay neural network with an attention mechanism is proposed for speaker recognition by investigating existing speaker recognition algorithms.The densely connected structure enhances the information reuse between different network layers while effectively controlling the model size.The channel attention mechanism and frame attention mechanism help the network to focus on more critical details of the features,making the speaker features extracted by statistical pooling more representative.Experimental results show that an equal error rate(EER)of 1.40％and a minimum detection cost criterion(MinDCF)of 0.15 were achieved on the VoxCeleb1 test dataset,demonstrating effectiveness on the speaker recognition task.

外文关键词：

Speaker recognitionDeep learningNeural networkDense connectivityAttention mechanism

作者：

和椿皓、常铁原、潘立冬、王珺

展开 >

作者单位：

河北大学电子信息工程学院保定 071000

关键词：

说话人识别深度学习神经网络密集连接注意力机制

基金：

河北省自然科学基金

项目编号：

F2022201013

出版年：

2024

DOI：

10.11684/j.issn.1000-310X.2024.02.016

应用声学

中国科学院声学研究所

应用声学

CSTPCD北大核心

影响因子：1.128

ISSN：1000-310X

年,卷(期)：2024.43(2)

参考文献量28