A frame sequence feature based method on human action recognition
Human action recognition research has evolved in tandem with the discipline of computer science and contemporary techniques of deep learning over time,which caused it is one of the most promising research directions in computer vision.In recent years the traditional two-stream network model is ineffective in extracting the interframe sequence features of the video simultaneously with the image and motion features.Resulting a decrease in two-stream model robustness when local sequence information and long-term motion information interact.Through pre-processing the dense optical flow frames of videos are entered into a temporal network,RGB frames into a spatial network and a frame sequence feature extracting network,all three networks are concurrently pretrained.At the conclusion of training,the operation of feature extraction is executed,the features are incorporated with the parallel fusion algorithm by adding weights,and the behavior categories are classified using Multi-Layer Perception.Experimental results on the UCF11,UCF50,and HMDB51 datasets demonstrate that our model effectively integrates the spatial-temporal and frame-sequence information of human actions,resulting in a significant improvement in recognition accuracy.Its classification accuracy on the three datasets was 99.17%,97.40%,and 96.88%,respectively,significantly enhancing the generalization capability and validity of conventional two-stream or three-stream models.
Human action recognitionThree-stream networkFrame sequence featureUCF11UCF50HMDB51