首页|基于多尺度卷积编码器的说话人验证网络

基于多尺度卷积编码器的说话人验证网络

扫码查看
说话人验证是一种有效的生物身份验证方法,说话人嵌入特征的质量在很大程度上影响着说话人验证系统的性能.最近,Transformer模型在自动语音识别领域展现出了巨大的潜力,但由于Transformer中传统的自注意力机制对局部特征的提取能力较弱,难以提取有效的说话人嵌入特征,因此Transformer模型在说话人验证领域的性能难以超越以往的基于卷积网络的模型.为了提高Transformer对局部特征的提取能力,文中提出了一种新的自注意力机制用于Transformer编码器,称为多尺度卷积自注意力编码器(Multi-scale Convolutional Self-Attention Encoder,MCAE).利用不同尺度的卷积操作来提取多时间尺度信息,并通过融合时域和频域的特征,使模型获得更丰富的局部特征表示,这样的编码器设计对于说话人验证是更有效的.通过实验表明,在3个公开的测试集上,所提方法的综合性能表现更佳.与传统的Transformer编码器相比,MCAE也是更轻量级的,这更有利于模型的应用部署.
Speaker Verification Network Based on Multi-scale Convolutional Encoder
Speaker verification is an effective biometric authentication method,and the quality of speaker embedding features largely affects the performance of speaker verification systems.Recently,the Transformer model has shown great potential in the field of automatic speech recognition,but it is difficult to extract effective speaker embedding features because the traditional self-attention mechanism of the Transformer model is weak for local feature extraction.The performance of the Transformer model in the field of speaker verification can hardly surpass that of the previous convolutional network-based models.In order to improve the Transformer's ability to extract local features,this paper proposes a new self-attention mechanism for Transformer encoder,called multi-scale convolutional self-attention encoder(MCAE).Using convolution operations of different sizes to extract multi-time-scale information and by fusing features in the time and frequency domains,it enables the model to obtain a richer rep-resentation of local features,and such an encoder design is more effective for speaker verification.It is shown experimentally that the proposed method is better in terms of comprehensive performance on three publicly available test sets.The MCAE is more lightweight compared to the conventional Transformer encoder,which is more favorable for the deployment of the model in appli-cations.

Speaker verificationSpeaker embeddingSelf-attention mechanismTransformer encoderMulti-scale convolution

刘小湖、陈德富、李俊、周旭文、胡姗、周浩

展开 >

浙江工业大学信息工程学院 杭州 310023

浙江讯飞智能科技有限公司 杭州 310000

说话人验证 说话人嵌入 自注意力机制 Transformer编码器 多尺度卷积

杭州市重大科技创新项目

2022AIZD0055

2024

计算机科学
重庆西南信息有限公司(原科技部西南信息中心)

计算机科学

CSTPCD北大核心
影响因子:0.944
ISSN:1002-137X
年,卷(期):2024.51(z1)
  • 30