Video preprocessing operations,mainly including video frame sampling and data augmentation,are essential methods to improving the performance of deep video models in action recognition,and they have recently received increased attention.In this study,a novel data preprocessing method for video action recognition depth models is proposed,with a focus on video preprocessing problems of insufficient guidance during key frame sampling and relatively simple data amplification methods.First,a novel motion-guided fragmented video sampling technique is designed,which comprehensively considers features among different video frames and short-term timing features of video clips.It acquires key video frames guided by significant motion actions and sampling adjacent video frames of these key frames,which effectively improves the spatiotemporal discrimination capacity of the selected video frames.Additionally,motivated by the random data augmentation successfully applied in image classification task,this study further introduces the random data augmentation strategy to augment the sampled short video clips.This ensures that video recognition depth models can learn more complex spatially varying information.Based on evaluation experiments using two public video recognition datasets and two representative network models,the results show that the proposed preprocessing method can improve the accuracy of the baseline model by more than 2.5 percentage points,with the highest improvement accuracy of 6.8 percentage points.The results demonstrate the effectiveness of the method in video action recognition.
video action recognitionpreprocessing methodmotion-guided fragmented video samplingdata aug-mentationdeep learning