Currently,deep learning is widely applied in the field of video recognition.However,in practical applications,deep neural networks are faced with the threat of adversarial attacks.Existing studies primarily focus on attack methods for image recognition models,but studies on the attack and defense of video recognition models are still insufficient.This paper proposes a multi-layer spatiotemporal attack method,which uses image model to attack video recognition model,and generates adversarial dis-turbance to video from space and time through multi-layer feature fusion and antagonistic interaction between video frames.The multi-layer spatiotemporal attack method is composed of a spatial attack module and a temporal attack module.In addition,in or-der to further improve the effectiveness and robustness of the attack,adaptive noise and adaptive loss weight mechanisms were in-troduced.Extensive experimental verification was conducted on the UCF101 and Kinetics-400 datasets,and the results showed that the success rate of attacks using this method was significantly improved compared to existing technologies.