Addressing the issues of limited receptive field size and weak feature interaction learning capabilities in traditional convolutional neural networks,resulting in relatively singular feature extraction in conventional convolutional neural network-based deepfake face detection techniques,a deepfake face detection method based on enhanced Swin Transformer is proposed in this pa-per.This method introduces local multi-head self-attention and global multi-head self-attention mechanisms,leveraging the strengths of Swin Transformer to effectively capture image context information and video temporal relationships,with strong global receptive fields and long-distance dependency modeling capabilities.Experimental results on the DFDC dataset demonstrate that our approach outperforms baseline methods,exhibiting superior deepfake face detection capabilities.
关键词
增强Swin/Transformer/伪造人脸检测/音视频分解/一致性分析/特征融合
Key words
enhanced Swin Transformer/deepfake face detection/audiovisual decomposition/consistency analysis/feature fusion