深度伪造及其取证技术综述

A survey of Deepfake and related digital forensics

丁峰 ¹匡仁盛 ¹周越 ¹孙珑 ²朱小刚 ³朱国普²

扫码查看

作者信息

1. 南昌大学软件学院,南昌 330047
2. 哈尔滨工业大学计算机科学与技术学院,哈尔滨 150006
3. 南昌大学公共政策与管理学院,南昌 330047;江西省物联网产业技术研究院,鹰潭 335003
折叠

摘要

深度学习作为机器学习的一个具有前景的重要分支,在计算机视觉方面取得了重大突破.深度伪造(Deepfake)通常指的是使用深度学习(deep learning)进行涉及人脸和人声的多媒体伪造技术,如果被恶意滥用会给社会带来灾难.深度伪造不仅限于面部的替换,还有修改面部特征、修改表情、唇形同步、姿势变换、完整脸生成、篡改音频到视频以及文本到视频等方式.人类面部在社会、政治、经济等方面的敏感性,使得深度伪造技术威胁着社会和个人的安全.对深度伪造产物进行检测也成为数字取证领域的一个重要研究课题.为了提供对Deepfake检测研究工作的最新概述,本文描述了各种针对解决Deepfake相关问题的处理方法.本文主要参考了谷歌学术检索2018-2022共5年的深度伪造论文,分为不同类别进行分析比较,并且详细介绍了深度伪造数据集的特点以及伪造方法,简述了深度伪造技术及其基本原理,介绍了检测器在深度伪造技术数据集上的性能效果,分别从输入维度、浅层特征和深层特针对深度伪造检测技术进行分类,并对未来发展前景进行展望.

Abstract

Deep learning,a promising branch of machine learning,has made significant breakthroughs in computer vision.However,Deepfake,which refers to the set of techniques for forging human-related multimedia data using deep learning,can bring disasters to society if used maliciously.It is not only limited to facial replacement,but also other manipulations,such as fabricating facial features,manipulating expressions,synchronizing lips,modifying head gestures,entire face synthesis,and tampering related audios to videos and related texts to videos.Moreover,it can be used to gener-ate faked pornographic videos or even faked speeches to subvert state power.Thus,deep forgery technology can greatly threaten society and individuals,thereyby detecting Deepfake has also become an important research topic in digital foren-sics.We conducted a systematic and critical survey to provide an overview of the latest research on Deepfake detection by exploring the recent developments in Deepfake and related forensic techniques.This survey mainly referred to papers on Deepfake in Google Scholar during 2018-2022.This survey divided the Deepfake detection techniques into two categories for analysis and comparison:input dimensions and forensic features.First,a comprehensive and systematic introduction of digital forensics is presented from the following aspects:1)the development and security of deep forgery detection technol-ogy,2)Deepfake technology architecture,and 3)the prevailing datasets and evaluation metrics.Then,this survey pres-ents Deepfake techniques in several categories.Finally,future challenges and development prospects are discussed.In terms of image and video effects,Deepfake techniques are usually divided into four categories:face replacement,lip syn-chronization,head puppets,and attribute modification.The most commonly used Deepfake algorithms are based on self-encoders,generative adversarial networks,and diffusion models.An typical autoencoder consists of two convolutional neu-ral networks acting as an encoder and a decoder.The encoder reduces the dimensions of the input targets'facial image and encodes it into a vector corresponding to facial features.We share the parameters of the encoder;that is,we use the same encoder to learn only the common feature information for the encoder network.The structure of a generative adversarial net-work is based on a generator and a discriminator.The generator is similar to the decoder in an autoencoder,which converts the input noise into a picture and sends it to the discriminator for discrimination along with the real existing picture.The discriminator and the generator use back-propagation to optimize the parameters.Moreover,diffusion model is a parameter-ized Markov chain trained using variational inference to produce samples that match the data after a finite time.There are always two processes to train a diffusion model.One is the forward process,also called the diffusion process.The other pro-cess is reverse diffusion,also known as the reverse process,which slowly restores the original image from noise through continuous sampling.In the Deepfake detection task,the datasets have also evolved to fill past gaps.In general,this sur-vey divides the Deepfake datasets into two generations.The first-generation datasets are often not large enough,and the quality of the content is not satisfying because of the low degree of research fervor.These source videos are usually from video sites or existing face datasets,which can lead to copyright and privacy concerns.The main first-generation datasets areUADFV,DF-TIMIT,FaceForensics,and diverse fake face dataset(DFFD).The second generation of face forgery data-sets has improved forgery effects and image clarity.The main second-generation datasets are Celeb-DF,Deepfake detec-tion challenge dataset(DFDC)preview,DeeperForensic-1.0,DFDC,Korean Deepfake detection dataset(KoDF),etc.In terms of input dimension,detecting Deepfake can be roughly divided into three categories:1)the first category is inputting the image or key frame from the video,namely,inputting the image or key frame extracted from the video and judging the input data from the visual performance.This category is commonly used because it can be promoted easily to other com-puter vision classification models,and most Deepfake videos are conducted by frame-by-frame images.2)The second is inputting continuous frames from video.In particular,multiple consecutive frames are inputted to allow the model to per-ceive the difference in the relationship between the frames from real and fake videos.3)The third is inputting multiple frames and audio simultaneously from the video;that is,the video's authenticity is detected by examining its video frames and audio together.The features focused on by Deepfake detection techniques also vary.This survey divides them into four categories:1)the frequency domain-based approach looks for anomalies in the video at the signal level,treating the video as a sequence of frames and a synchronized audio signal.Such anomalies,including image mismatches and mismatches in audio-video synchronization,are usually generated from the mismatches at the signal level during Deepfake video genera-tion.2)The texture and spatio-temporal approaches tend to focus only on face position and feature matching in the forged video generation process,where breakdowns that violate the laws of physics and human physiology may occur.3)The reconstruction-classification learning methods emphasize the common compact representations of genuine faces and enhance the learned representations to be aware of unknown forgery patterns.Classification learning involves mining the essential discrepancy between real and fake images,facilitating the understanding of forgeries.4)Data-driven methods are detection methods that do not target specific features.However,they use supervised learning to feed real and fake videos into the model for training.The road to the research on deep forgery techniques and deep forgery detection is still long.We must overcome the existing shortcomings and face the challenges of future technological advances.

关键词

深度造假/机器学习/人工智能/深度学习/数字取证/数字反取证

Key words

Deepfake/machine learning/artificial intelligence/deep learning/digital forensics/digital anti-forensics

引用本文复制引用

基金项目

国家自然科学基金项目(62262041)

国家自然科学基金项目(62172402)

数据安全治理关键技术研究与应用项目(20224BBC41001)

江西省自然科学基金项目(20232BAB202011)

出版年

2024

中国图象图形学报

中国科学院遥感应用研究所,中国图象图形学学会 ,北京应用物理与计算数学研究所

中国图象图形学报

CSTPCDCSCD北大核心

影响因子：1.111

ISSN：1006-8961

参考文献量178

段落导航