首页|基于深度情感嵌入和图注意力网络的情感伪造音频检测方法

基于深度情感嵌入和图注意力网络的情感伪造音频检测方法

扫码查看
情感伪造音频通过改变语音的情感状态来达到欺骗目的,这对现有的伪造音频检测模型提出了新的挑战.提出一种基于深度情感嵌入和图注意力网络的情感伪造音频检测方法(Graph Attention Networks Using Deep Emotion Embedding,GADE),旨在提高对情感伪造音频的检测能力.GADE由深度情感嵌入提取前端和基于图注意力网络的后端2个部分组成.深度情感嵌入提取前端利用共注意力机制结合传统手工特征与深度特征,分别提取语音中时域和频域的深度情感信息;基于图注意力网络的后端能够有效融合时域和频域信息,提高模型对情感伪造音频的检测性能.在ASVspoof 2019、ASVspoof 2021和EmoFake数据集上与常见的伪造音频检测模型进行对比实验.结果表明:提出的GADE在未使用情感伪造音频训练的情况下,相比现有的先进伪造音频检测模型AASIST,对情感伪造音频的检测性能提高了22.8%;在使用情感伪造音频训练后,对情感伪造音频的检测性能提高了77.3%.
Emotion fake audio detection method based on deep emotion embedding and graph attention network
Emotion fake audio deceives by altering the emotional state of speech.This poses a novel challenge to exist-ing fake audio detection models.The paper introduces a fake audio detection method based on deep emotion embedding and a graph attention network named graph attention networks using deep emotion embedding(GADE),to enhance the de-tection of emotion fake audio.GADE comprises two components:a frontend for deep emotion embedding extraction and a backend based on the graph attention network.The frontend module employs the co-attention mechanism that combines traditional manual features and deep features to extract deep emotional information from both the time and frequency do-mains of speech.The backend network effectively fuses information across these domains,thereby improving the model's detection capability for emotional audio.Comparative experiments were conducted on ASVspoof 2019,ASVspoof 2021,and EmoFake datasets with common fake audio detection models.The results show that the proposed GADE improves the emotion fake audio detection performance by 22.8%compared to the existing advanced fake audio detection model AASIST,without using emotion fake audio training.When emotion fake audio is incorporated,the detection performance of emotion fake audio improved by 77.3%.

fake audio detectionemotion fake audiodeep featuregraph attention network

赵炎、李青、周淑霞、齐巧玲、李英双、董永峰

展开 >

河北工业大学 人工智能与数据科学学院,天津 300401

天津市虚拟现实与可视计算国际联合中心,天津 300401

河北交通职业技术学院,河北 石家庄 050035

河北省高校道路交通感知与智能应用技术研发中心,河北 石家庄 050035

邯郸科技职业学院,河北 邯郸 056046

展开 >

伪造音频检测 情感伪造音频 深度特征 图注意力网络

2024

河北工业大学学报
河北工业大学

河北工业大学学报

CSTPCD
影响因子:0.344
ISSN:1007-2373
年,卷(期):2024.53(6)