说话人音频攻击与对抗技术研究综述

A review on speaker audio attack and defense technologies

孙知信 ¹赵杰 ¹王恩良 ¹刘晨磊 ¹范连成 ²刘畅²

扫码查看

作者信息

1. 南京邮电大学国家邮政局邮政行业技术研发中心(物联网技术),江苏南京 210003;南京邮电大学宽带无线通信与传感网技术教育部重点实验室,江苏南京 210003;南京邮电大学江苏省邮政大数据技术与应用工程研究中心,江苏南京 210003
2. 安徽南陵县邮政业发展中心,安徽芜湖 241399
折叠

摘要

文中概括了说话人音频攻击与对抗技术的最新进展.由于说话人音频攻击已经成为语音应用安全的严重威胁,以WaveNet、Transformer和GAN三种模型在音频攻击技术中的应用作为节点,分别介绍以其为基础的音频攻击技术.音频对抗技术则以涵盖的攻击技术分为 3 类,分别是基础音频攻击、重放攻击和深度伪造攻击.系统地阐述了音频攻击与对抗技术的最新研究成果,并分析比较了各算法在不同条件下的优劣,同时还介绍了音频技术常用的数据集.最后结合该领域目前的研究现状,提出了说话人音频攻防对抗技术研究中亟待关注与研究的问题.

Abstract

This study reviews recent advances in speaker audio attack and defense technologies.As speaker audio attacks have become serious threats to the security of voice applications,we focus on speaker audio attacks that target applications based on the three audio models,WaveNet,Transformer and GAN,and analyze the audio attack technologies based on them.We divide the audio defense technologies into three categories based on the attacks target:basic audio attacks,replay attacks and deep forgery attacks.We systematically expound the latest studies on speaker audio attack and defense technologies,analyze and compare the advantages and disadvantages of each algorithm under different conditions,and introduce the commonly used data sets of audio technologies.Finally,we provide certain issues that need urgent attention and research for speaker audio attack and defense technologies.

关键词

说话人音频/音频伪造/音频鉴伪/音频数据集/深度学习

Key words

speaker audio/audio forgery/audio forensics/audio datasets/deep learning

引用本文复制引用

基金项目

国家自然科学基金(61972208)

国家自然科学基金(62272239)

出版年

2024

南京邮电大学学报(自然科学版)

南京邮电大学

南京邮电大学学报(自然科学版)

CSTPCD北大核心

影响因子：0.486

ISSN：1673-5439

段落导航