北京工业大学学报2025,Vol.51Issue(1) :42-50.DOI:10.11936/bjutxb2023060027

基于ASP-SERes2Net的说话人识别算法

Speaker Recognition Algorithm Based on ASP-SERes2Net

令晓明 陈鸿雁 张小玉 张真
北京工业大学学报2025,Vol.51Issue(1) :42-50.DOI:10.11936/bjutxb2023060027

基于ASP-SERes2Net的说话人识别算法

Speaker Recognition Algorithm Based on ASP-SERes2Net

令晓明 1陈鸿雁 2张小玉 2张真2
扫码查看

作者信息

  • 1. 兰州交通大学光电技术与智能控制教育部重点实验室,兰州 730070;兰州交通大学国家绿色镀膜技术与装备工程技术研究中心,兰州 730070
  • 2. 兰州交通大学光电技术与智能控制教育部重点实验室,兰州 730070
  • 折叠

摘要

为提升说话人识别的特征提取能力,解决在噪声环境下识别率低的问题,提出一种基于残差网络的说话人识别算法—ASP-SERes2Net.首先,采用梅尔语谱图作为神经网络的输入;其次,改进Res2Net网络的残差块,并且在每个残差块后引入压缩激活(squeeze-and-excitation,SE)注意力模块;然后,用注意力统计池化(attention statistics pooling,ASP)代替原来的平均池化;最后,采用附加角裕度的 Softmax(additive angular margin Softmax,AAM-Softmax)对说话人身份进行分类.通过实验,将ASP-SERes2Net算法与时延神经网络(time delay neural network,TDNN)、ResNet34 和 Res2Net 进行对比,ASP-SERes2Net 算法的最小检测代价函数(minimum detection cost function,MinDCF)值为0.040 1,等误率(equal error rate,EER)为0.52%,明显优于其他3个模型.结果表明,ASP-SERes2Net算法性能更优,适合应用于噪声环境下的说话人识别.

Abstract

To improve the feature extraction ability of speaker recognition and enhance the low recognition rate in noise environment,a speaker recognition algorithm—ASP-SERes2Net is proposed based on residual network.First,the Mel spectrum was used as the input of the neural network.Second,the residual block of the Res2Net was improved and squeeze-and-excitation(SE)attention module was introduced.Then,the average pooling was replaced by the attention statistics pooling(ASP).Finally,the additive angular margin Softmax(AAM-Softmax)function was used to classify the identity of the speaker.Through experiments,the performance of the ASP-SERes2Net algorithm was compared with that of time delay neural network(TDNN),ResNet34 and Res2Net.The minimum detection cost function(MinDCF)value of the ASP-SERes2Net algorithm was 0.040 1 and equal error rate(EER)was 0.52%,which were significantly better than the other three models.Results show that the ASP-SERes2Net algorithm has better performance and is suitable for speaker recognition applied in noise environment.

关键词

说话人识别/梅尔语谱图/Res2Net/压缩激活(squeeze-and-excitation,SE)注意力模块/注意力统计池化(attention/statistics/pooling,ASP)/附加角裕度的/Softmax(additive/angular/margin/Softmax,AAM-Softmax)

Key words

speaker recognition/Mel spectrogram/Res2Net/squeeze-and-excitation(SE)attention module/attention statistics pooling(ASP)/additive angular margin Softmax(AAM-Softmax)

引用本文复制引用

出版年

2025
北京工业大学学报
北京工业大学

北京工业大学学报

北大核心
影响因子:0.418
ISSN:0254-0037
段落导航相关论文