数据采集与处理2024,Vol.39Issue(5) :1062-1084.DOI:10.16337/j.1004-9037.2024.05.003

基于深度学习的说话人确认方法研究现状及展望

State of the Art and Prospects of Deep Learning-Based Speaker Verification

李建琛 韩纪庆
数据采集与处理2024,Vol.39Issue(5) :1062-1084.DOI:10.16337/j.1004-9037.2024.05.003

基于深度学习的说话人确认方法研究现状及展望

State of the Art and Prospects of Deep Learning-Based Speaker Verification

李建琛 1韩纪庆1
扫码查看

作者信息

  • 1. 哈尔滨工业大学计算机科学与技术学院,哈尔滨 150001
  • 折叠

摘要

随着深度学习的不断发展,说话人确认(Speaker verification)技术已经取得了长足的进步.该技术相较于其他生物特征识别技术,具有可远程操作、成本低和易于人机交互等优势,在公安刑侦、金融服务等领域展现出广泛的应用前景.本文系统综述了基于深度学习的说话人确认技术的发展脉络.首先,介绍了基于深度学习的说话人特征表示模型在模型输入与结构、池化层、有监督损失函数和自监督学习与预训练模型4个方面的发展历程和研究现状;其次,探讨了说话人确认技术在实际应用中面临的跨域不匹配问题,如噪声干扰、信道不匹配和远场语音等,并概述了相应的领域自适应和领域泛化方法;最后,指出了进一步的研究方向.

Abstract

With the development of deep learning,speaker verification has made great progress.Compared with other biometric identification technologies,this technology has advantages of remote operation,low cost,easy human-computer interaction,etc.,thus it shows a wide range of application prospects in the fields of public security,criminal investigation,and financial services.A systematic overview of the development lineage of deep learning-based speaker verification techniques is provided.Firstly,the development history and research status of deep learning-based speaker representation model are introduced in four aspects:Model input and structure,pooling layer,supervised loss function,and self-supervised learning and pre-training model.Then,the challenges faced by speaker verification are discussed,such as cross-domain mismatch problems like noise interference,channel mismatch and far-field speech,and the corresponding domain adaptation and domain generalization methods are outlined.Finally,the further research directions are presented.

关键词

说话人识别/说话人确认/深度学习/领域不匹配/自监督学习

Key words

speaker recognition/speaker verification/deep learning/domain mismatch/self-supervised learning

引用本文复制引用

基金项目

国家自然科学基金(62376071)

出版年

2024
数据采集与处理
中国电子学会 中国仪器仪表学会信号处理学会 中国仪器仪表学会中国物理学会微弱信号检测学会 南京航空航天大学

数据采集与处理

CSTPCDCSCD北大核心
影响因子:0.679
ISSN:1004-9037
参考文献量143
段落导航相关论文