基于自注意力机制的音频对抗样本生成方法

Audio Adversarial Examples Generation Method Based on Self-attention Mechanism

李珠海 ¹郭武¹

扫码查看

作者信息

1. 中国科学技术大学语音及语言信息处理国家工程研究中心,合肥 230027
折叠

摘要

随着个人语音数据在网络上的传播以及自动说话人识别算法的发展,个人的声纹特征面对着泄露的风险.音频对抗样本可以在人耳主观听觉不变的前提下,使得自动说话人识别算法失效,从而保护个人的声纹特征.本文在典型的音频对抗样本生成算法FoolHD模型的基础上引入了自注意力机制来改进对抗样本生成,该方法称为FoolHD-MHSA.首先,使用卷积神经网络作为编码器来提取输入音频频谱的对抗扰动谱图;然后利用自注意力机制从全局角度提取扰动谱不同部分特征的关联特征,同时将网络聚焦到扰动谱中的关键信息、抑制无用信息;最后,使用解码器将处理后的扰动谱隐写到输入频谱中得到对抗样本频谱.实验结果表明,FoolHD-MHSA方法生成的对抗样本相比FoolHD方法有着更高的攻击成功率和平均客观语音质量评估(Perceptual evaluation of speech quality,PESQ)得分.

Abstract

With the widespread of personal speech and development of automatic speaker recognition algorithms,personal privacy protection is in a high-risk situation.Audio adversarial examples can protect personal voiceprint features through disabling automatic speaker recognition algorithms while the subjective hearing of the human ear remains unchanged.We improve the typical adversarial attacks algorithm FoolHD with multi-head self-attention mechanism,and we call it FoolHD-MHSA.First,convolutional neural networks are introduced as the encoder to extract adversarial perturbation spectrograms.Second,we use self-attention mechanism to extract correlation features of different parts of perturbation spectrogram from a global perspective,focus the network on the important information and suppress the useless information.Finally,the processed perturbation spectrogram is steganographed into the input spectrogram with a decoder to get adversarial example spectrogram.Experimental results show that FoolHD-MHSA can generate adversarial examples with higher attack success rate and average PESQ score than FoolHD.

关键词

自注意力机制/对抗样本/说话人识别/深度神经网络

Key words

self-attention mechanism/adversarial examples/speaker recognition/deep neural network

引用本文复制引用

出版年

2024

数据采集与处理

中国电子学会中国仪器仪表学会信号处理学会　中国仪器仪表学会中国物理学会微弱信号检测学会　南京航空航天大学

数据采集与处理

CSTPCDCSCD北大核心

影响因子：0.679

ISSN：1004-9037

参考文献量16

段落导航