Human action recognition based on skeleton dynamic temporal filter
Human action recognition is one of the key research areas in computer vision,with a wide range of applications such as human-computer interaction and intelligent surveillance.Existing methods for skeleton-based action recognition often combine graph convolutional networks(GCN)with temporal convolutional networks(TCN).However,the limited size of convolutional kernel restricts the models'global temporal modeling capability.Moreover,applying convolutional kernel to skeletal data leads to a lack of discriminative power among different skeleton points.Furthermore,using TCN to extract features often entails repeated calculations,leading to an increase in the parameter quantity of TCN as the network deepens.To address these issues,signal processing methods were utilized,and skeleton dynamic temporal filtering(SDTF)module was proposed for skeleton action recognition to replace TCN for global modeling.Based on this,lightweight improvements were made to AGCN,reducing the complexity.SDTF modeled temporal features through Fourier transform,multiplying the frequency domain features obtained from Fourier transform with the filtered frequency domain output,and then undergoing inverse Fourier transform.Extensive experiments conducted on the NTU-RGBD and Kinetics-Skeleton datasets demonstrated that the proposed model significantly reduced network parameters and computational complexity,while achieving comparable or even superior recognition performance compared to the original model.
human action recognitiongraph convolutional networkdynamic temporal filterFourier transformtemporal convolutional networks