VIDEO BEHAVIOR RECOGNITION BASED ON DYNAMIC SPATIOTEMPORAL INFORMATION FUSION
Video data has complex and redundant information in time and space dimensions.In order to solve this problem,we designed a motion module.This module calculated the temporal and spatial differences between pixels based on time and space features.The dynamic spatiotemporal differences were decomposed into two branches for processing.One branch was used to correct the temporal and spatial displacements on adjacent frames,and the other one was used to capture contextual information at adjacent moments.In the time interval of adjacent frames,the temporal and spatial probability distribution of pixels was modeled.The results show that the motion module improves the performance of video recognition while slightly affecting flops and parameters.Its effectiveness and efficiency was verified on public datasets.
Deep learningSpatiotemporal featuresFeature fusionBehavior recognition