A survey of Deepfake and related digital forensics
Deep learning,a promising branch of machine learning,has made significant breakthroughs in computer vision.However,Deepfake,which refers to the set of techniques for forging human-related multimedia data using deep learning,can bring disasters to society if used maliciously.It is not only limited to facial replacement,but also other manipulations,such as fabricating facial features,manipulating expressions,synchronizing lips,modifying head gestures,entire face synthesis,and tampering related audios to videos and related texts to videos.Moreover,it can be used to gener-ate faked pornographic videos or even faked speeches to subvert state power.Thus,deep forgery technology can greatly threaten society and individuals,thereyby detecting Deepfake has also become an important research topic in digital foren-sics.We conducted a systematic and critical survey to provide an overview of the latest research on Deepfake detection by exploring the recent developments in Deepfake and related forensic techniques.This survey mainly referred to papers on Deepfake in Google Scholar during 2018-2022.This survey divided the Deepfake detection techniques into two categories for analysis and comparison:input dimensions and forensic features.First,a comprehensive and systematic introduction of digital forensics is presented from the following aspects:1)the development and security of deep forgery detection technol-ogy,2)Deepfake technology architecture,and 3)the prevailing datasets and evaluation metrics.Then,this survey pres-ents Deepfake techniques in several categories.Finally,future challenges and development prospects are discussed.In terms of image and video effects,Deepfake techniques are usually divided into four categories:face replacement,lip syn-chronization,head puppets,and attribute modification.The most commonly used Deepfake algorithms are based on self-encoders,generative adversarial networks,and diffusion models.An typical autoencoder consists of two convolutional neu-ral networks acting as an encoder and a decoder.The encoder reduces the dimensions of the input targets'facial image and encodes it into a vector corresponding to facial features.We share the parameters of the encoder;that is,we use the same encoder to learn only the common feature information for the encoder network.The structure of a generative adversarial net-work is based on a generator and a discriminator.The generator is similar to the decoder in an autoencoder,which converts the input noise into a picture and sends it to the discriminator for discrimination along with the real existing picture.The discriminator and the generator use back-propagation to optimize the parameters.Moreover,diffusion model is a parameter-ized Markov chain trained using variational inference to produce samples that match the data after a finite time.There are always two processes to train a diffusion model.One is the forward process,also called the diffusion process.The other pro-cess is reverse diffusion,also known as the reverse process,which slowly restores the original image from noise through continuous sampling.In the Deepfake detection task,the datasets have also evolved to fill past gaps.In general,this sur-vey divides the Deepfake datasets into two generations.The first-generation datasets are often not large enough,and the quality of the content is not satisfying because of the low degree of research fervor.These source videos are usually from video sites or existing face datasets,which can lead to copyright and privacy concerns.The main first-generation datasets areUADFV,DF-TIMIT,FaceForensics,and diverse fake face dataset(DFFD).The second generation of face forgery data-sets has improved forgery effects and image clarity.The main second-generation datasets are Celeb-DF,Deepfake detec-tion challenge dataset(DFDC)preview,DeeperForensic-1.0,DFDC,Korean Deepfake detection dataset(KoDF),etc.In terms of input dimension,detecting Deepfake can be roughly divided into three categories:1)the first category is inputting the image or key frame from the video,namely,inputting the image or key frame extracted from the video and judging the input data from the visual performance.This category is commonly used because it can be promoted easily to other com-puter vision classification models,and most Deepfake videos are conducted by frame-by-frame images.2)The second is inputting continuous frames from video.In particular,multiple consecutive frames are inputted to allow the model to per-ceive the difference in the relationship between the frames from real and fake videos.3)The third is inputting multiple frames and audio simultaneously from the video;that is,the video's authenticity is detected by examining its video frames and audio together.The features focused on by Deepfake detection techniques also vary.This survey divides them into four categories:1)the frequency domain-based approach looks for anomalies in the video at the signal level,treating the video as a sequence of frames and a synchronized audio signal.Such anomalies,including image mismatches and mismatches in audio-video synchronization,are usually generated from the mismatches at the signal level during Deepfake video genera-tion.2)The texture and spatio-temporal approaches tend to focus only on face position and feature matching in the forged video generation process,where breakdowns that violate the laws of physics and human physiology may occur.3)The reconstruction-classification learning methods emphasize the common compact representations of genuine faces and enhance the learned representations to be aware of unknown forgery patterns.Classification learning involves mining the essential discrepancy between real and fake images,facilitating the understanding of forgeries.4)Data-driven methods are detection methods that do not target specific features.However,they use supervised learning to feed real and fake videos into the model for training.The road to the research on deep forgery techniques and deep forgery detection is still long.We must overcome the existing shortcomings and face the challenges of future technological advances.