Facial Forgery Detection Based on Key Frames and Fused Spatial-Temporal Features
The deep learning-based facial forgery detection is commonly approached as a binary classification problem.The accu-racy of model training results is not only affected by the quality and quantity of training data,but also related to training strategy and network architecture design..In this paper,we propose a new method based on key frames and spatial-temporal features.Firstly,the weighted optical flow energy analysis is used to detect the key frames in a video.Then,the optical flow and LBP fea-tures of the key frames are fused to form feature maps with spatial and temporal characteristics.After data augmentation,the fea-ture maps are fed into the CNN model for training.Evaluations conducted on the FaceForensics++and Celeb-df datasets de-monstrate that the proposed method achieves superior or comparable detection accuracy.Experimental results on cross-datasets show that the proposed method,utilizing the Efficientnet-V2 structure,achieves the best performance on the FaceForensics++database with the accuracy of 90.1%.Furthermore,the overall performance of the XceptionNet structure surpasses that of other methods,achieving the accuracy over 80%,thus demonstrating superior generalization performance of the proposed method.