融合空间特征的多尺度深度伪造检测方法

Multi-Scale Deepfake Detection Method with Fusion of Spatial Features

张溢文 ¹蔡满春 ¹陈咏豪 ¹朱懿 ¹姚利峰¹

扫码查看

作者信息

1. 中国人民公安大学信息网络安全学院,北京 100038
折叠

摘要

随着深度学习的快速发展,深度伪造技术作为一种基于深度学习生成模型的图像篡改技术迅速兴起.深度伪造视频图像的泛滥给国家和社会安全带来了负面影响,使得深度伪造检测技术的重要性日益凸显.然而,现有基于卷积神经网络(CNN)或ViT的深度伪造检测技术普遍存在模型参数量大、训练速度慢、容易过拟合、应对视频压缩或噪声的鲁棒性差等问题.为此,提出一种融合空间特征的多尺度深度伪造检测方法.首先采用自动白平衡(AWB)算法对输入图像进行对比度调整,以增强模型的鲁棒性;然后利用MViT和CNN分别提取输入图像的多尺度全局和局部特征;接着提出一种改进的稀疏交叉注意力机制,对用MViT提取的全局特征和用CNN提取的局部特征进行融合,提升模型的识别效果;最后针对融合后的特征,通过多层感知机(MLP)进行分类.实验结果表明,该方法在Deepfakes、FaceSwap和Celeb-DF(v2)数据集上的帧水平AUC分别达到0.986、0.984和0.988,且在跨压缩率实验中表现出了较强的鲁棒性,模型改进前后的对比也验证了所提各模块对检测结果的提升作用.

Abstract

With the rapid advancement in deep learning,deepfake technology has gained significant momentum as a form of image manipulation based on generative models.The proliferation of deepfake videos and images has a detrimental sociopolitical impact,highlighting the increasing significance of deepfake detection techniques.Existing deepfake detection methods based on Convolutional Neural Networks(CNN)and Vision Transformers(ViT)commonly suffer from challenges such as large sizes of model parameters,slow training speeds,susceptibility to overfitting,and limited robustness against video compression and noise.To address these challenges,a multi-scale deepfake detection method that integrates spatial features is proposed herein.Firstly,an Automatic White Balance(AWB)algorithm is employed to adjust the contrast of input images,thereby enhancing robustness of the model.Subsequently,Multi-scale ViT(MViT)and CNN are separately utilized to extract the multi-scale global and local features,respectively,of the input images.These global and local features are then fused together using an improved sparse cross-attention mechanism to enhance the recognition performance of the model.Finally,the fused features are classified using a Multi-Layer Perceptron(MLP).According to the experimental results,the proposed model achieves frame-level Area Under the Curve(AUC)scores of 0.986,0.984,and 0.988 on the Deepfakes,FaceSwap,and Celeb-DF(v2)datasets,respectively,demonstrating strong robustness in cross-compression experiments.Additionally,comparative experiments before and after specific model improvements have validated the gains provided by each module in terms of detection results.

关键词

深度伪造/卷积神经网络/特征融合/交叉注意力/数据增强

Key words

deepfake/Convolutional Neural Networks(CNN)/feature fusion/cross attention/data augmentation

引用本文复制引用

基金项目

中国人民公安大学网络空间安全执法技术双一流创新研究专项(2023SYL07)

出版年

2024

计算机工程

华东计算技术研究所　上海市计算机学会

计算机工程

CSTPCD北大核心

影响因子：0.581

ISSN：1000-3428

参考文献量2

段落导航