Aiming at the problem of low accuracy of human action recognition at present,a human action recognition method,DSTFP,based on dual spatiotemporal feature pyramid network structure is proposed.This method uses the SlowFast network as the backbone network to extract features of different scales,and the features of multiple scales are input into a dual spatiotemporal feature pyramid for processing to increase the sensitivity of network's to multiple scales.The first step is the semantic enhancement pyramid(SEEP),which fuses features of multiple scales from top to bottom,and high-level semantic information is transmitted in the features of different scales.The second step is the spatial enhancement pyramid(SPEP),a bottom-up fusion method is adopted.The spatial positioning information is transmitted in the feature of different scales.The experimental results on the public dataset AVA show that this method can obtain 24.97 mAP,which is 0.77 mAP higher than the original network,and effectively improves the accuracy of human behavior recognition.Compared with similar algorithms,it meets requirements of practical application better.