A Feature Fusion Dual-Stream Deepfake Detection Method for Forged Faces
The rapid advancement of Deepfake technology has rendered deepfake video and audio content increasingly realistic,with widespread applications in political forgery,financial fraud,and the dissemination of fake news.Therefore,the research and development of efficient Deepfake detection methods have become cru-cial.This study explores a strategy that combines Vision Transformers(ViT)with Convolutional Neural Net-works(CNN),leveraging the advantages of CNN in local feature extraction and the potential of ViT in model-ing global relationships to enhance the performance of Deepfake detection algorithms in practical applications.Moreover,to strengthen the model's resilience against the impacts of image or video compression,frequency domain features are introduced,and a dual-stream network is employed to extract features,thereby improving detection performance and stability across compressed scenarios.Experimental results indicate that the dual-stream network model based on multi-domain feature fusion demonstrates commendable detection performance on the FaceForensics++dataset,achieving an ACC value of 96.98% and an AUC value of 98.82% .Satis-factory results are also obtained in cross-dataset detection,with an AUC value of 75.41% on the Celeb-DF dataset.
Deepfake detectionCNN combined with ViTRGB frequency domain feature fusioncross-compression scenarios