Deepfake face detection based on enhanced Swin Transformer
Addressing the issues of limited receptive field size and weak feature interaction learning capabilities in traditional convolutional neural networks,resulting in relatively singular feature extraction in conventional convolutional neural network-based deepfake face detection techniques,a deepfake face detection method based on enhanced Swin Transformer is proposed in this pa-per.This method introduces local multi-head self-attention and global multi-head self-attention mechanisms,leveraging the strengths of Swin Transformer to effectively capture image context information and video temporal relationships,with strong global receptive fields and long-distance dependency modeling capabilities.Experimental results on the DFDC dataset demonstrate that our approach outperforms baseline methods,exhibiting superior deepfake face detection capabilities.
enhanced Swin Transformerdeepfake face detectionaudiovisual decompositionconsistency analysisfeature fusion