In recent years,Vision Transformer(ViT)has shown amazing potential in areas such as image classification,target detection,and image generation.However,the performance improvement of ViT relies on the number of parameters in the network,leading to limited application scenarios.Inspired by neurology,this paper innovatively proposes to apply the recurrent structure between neurons in the human brain to ViT.This paper explains for the first time how the recurrent structure works from the perspective of Riemannian geometry,then presents the recurrent Vision Transformer model based on Token-to-Token Transformer architecture.Experimental results show that the introduction of the recurrent structure can substantially improve the performance of ViT with essentially no change in the number of parameters:we apply the recurrent structure in the Transformer networks,and on the Imagenet classification dataset,the network increase the parameters by only 0.14%,but bring a 9%improvement in classification accuracy.In the target detection task,increasing the parameters by 0.1%brings a 10.7%performance improvement.