Batch Reinforcement Learning is an important branch in the field of reinforcement learning.As the need to rely on historical data for reinforcement learning became more and more pressing,offline reinforcement learning was not systematically proposed until 2020.Therefore,offline reinforcement learning,also known as batch reinforcement learning,is an important research topic in the field of deep reinforcement learning.By utilizing behavior policies to generate static datasets and without online interaction with the environment,this approach successfully converts large datasets into powerful decision engines.The rise of offline reinforcement learning has not only accelerated the development of decision engines but also provided researchers with a stable and efficient training framework.In recent years,offline reinforcement learning methods have received extensive attention and have undergone in-depth research,achieving remarkable results in practical applications.Currently,these methods have been used in recommendation systems,navigation,driving,natural language processing,and robot control,as well as in the fields of healthcare and energy,and are considered one of the most promising technology approaches for applying reinforcement learning in the real world.In this paper,we first introduce the background and theoretical basis of offline reinforcement learning.Secondly,starting from the solution idea,the offline reinforcement learning methods are classified into three major categories:model-free,model-based,and transformer-based.In the meantime,we analyze the research status and development trends of each method.Specifically,these methods do not share the same focus and aim to address distinct challenges,achieving incremental improvements in handling distribution shifts.Model-free offline reinforcement learning methods focus on policy evaluation and improvement by directly utilizing trajectory information from static data.In contrast,model-based offline reinforcement learning methods aim to learn dynamic environment models from static datasets to optimize policies.Recently,transformer-based offline reinforcement learning methods have attracted prominence due to their superior sequence modeling abilities,showing exceptional performance in managing complex environments and long-term sequential data.Thirdly,we compare the three most popular experimental environments D4RL,RL Unplugged,and NeoRL.They offer rich datasets and standardized evaluation metrics to compare the effectiveness and stability of various offline reinforcement learning algorithms.D4RL and RL Unplugged are biased towards simulation platforms,while NeoRL is biased towards practical applications.Specifically,D4RL includes navigation,manipulation,and locomotion tasks.RL Unplugged includes manipulation,locomotion,and game tasks.NeoRL includes industrial benchmarking,a stock exchange simulator,and city management tasks.Then,we introduce the applications of offline reinforcement learning in multiple real-world fields.These applications demonstrate the potential and value of offline reinforcement learning in solving real-world problems.Finally,we provide prospects and summaries for offline reinforcement learning,to promote more research in this field.In the future,with a deeper understanding of the theory of offline reinforcement learning and further technological advance-ments,it is anticipated that this field will continue to attract increasing research attention.Offline reinforcement learning combines the advantages of deep learning and reinforcement learning and is expected to provide smarter and more efficient solutions to various complex tasks.