首页|离线强化学习研究综述

离线强化学习研究综述

A Review of Research on Offline Reinforcement Learning

扫码查看
离线强化学习也称为批量强化学习,是深度强化学习领域的一项重要研究内容.它利用行为策略生成静态数据集,无需在线和环境交互,成功地将大规模数据集转变成强大的决策引擎.近年来,离线强化学习方法得到了广泛关注和深入研究,并在实际应用中取得了瞩目的成绩.目前,该方法已经用于推荐系统、导航驾驶、自然语言处理、机器人控制以及医疗与能源等应用领域,并被看作是现实世界应用强化学习最具潜力的技术途径之一.该文首先介绍了离线强化学习的背景与理论基础.随后从求解思路出发,将离线强化学习方法分为无模型、基于模型和基于Transformer模型3大类,并对各类方法的研究现状与发展趋势进行分析.同时,对比了目前3个最流行的实验环境D4RL、RL Unplugged和NeoRL.进而介绍了离线强化学习技术在现实世界诸多领域的应用.最后,对离线强化学习进行总结与展望,以此推动更多该领域的研究工作.
Batch Reinforcement Learning is an important branch in the field of reinforcement learning.As the need to rely on historical data for reinforcement learning became more and more pressing,offline reinforcement learning was not systematically proposed until 2020.Therefore,offline reinforcement learning,also known as batch reinforcement learning,is an important research topic in the field of deep reinforcement learning.By utilizing behavior policies to generate static datasets and without online interaction with the environment,this approach successfully converts large datasets into powerful decision engines.The rise of offline reinforcement learning has not only accelerated the development of decision engines but also provided researchers with a stable and efficient training framework.In recent years,offline reinforcement learning methods have received extensive attention and have undergone in-depth research,achieving remarkable results in practical applications.Currently,these methods have been used in recommendation systems,navigation,driving,natural language processing,and robot control,as well as in the fields of healthcare and energy,and are considered one of the most promising technology approaches for applying reinforcement learning in the real world.In this paper,we first introduce the background and theoretical basis of offline reinforcement learning.Secondly,starting from the solution idea,the offline reinforcement learning methods are classified into three major categories:model-free,model-based,and transformer-based.In the meantime,we analyze the research status and development trends of each method.Specifically,these methods do not share the same focus and aim to address distinct challenges,achieving incremental improvements in handling distribution shifts.Model-free offline reinforcement learning methods focus on policy evaluation and improvement by directly utilizing trajectory information from static data.In contrast,model-based offline reinforcement learning methods aim to learn dynamic environment models from static datasets to optimize policies.Recently,transformer-based offline reinforcement learning methods have attracted prominence due to their superior sequence modeling abilities,showing exceptional performance in managing complex environments and long-term sequential data.Thirdly,we compare the three most popular experimental environments D4RL,RL Unplugged,and NeoRL.They offer rich datasets and standardized evaluation metrics to compare the effectiveness and stability of various offline reinforcement learning algorithms.D4RL and RL Unplugged are biased towards simulation platforms,while NeoRL is biased towards practical applications.Specifically,D4RL includes navigation,manipulation,and locomotion tasks.RL Unplugged includes manipulation,locomotion,and game tasks.NeoRL includes industrial benchmarking,a stock exchange simulator,and city management tasks.Then,we introduce the applications of offline reinforcement learning in multiple real-world fields.These applications demonstrate the potential and value of offline reinforcement learning in solving real-world problems.Finally,we provide prospects and summaries for offline reinforcement learning,to promote more research in this field.In the future,with a deeper understanding of the theory of offline reinforcement learning and further technological advance-ments,it is anticipated that this field will continue to attract increasing research attention.Offline reinforcement learning combines the advantages of deep learning and reinforcement learning and is expected to provide smarter and more efficient solutions to various complex tasks.

artificial intelligencereinforcement learningdeep reinforcement learningoffline reinforcement learningbatch reinforcement learning

乌兰、刘全、黄志刚、张立华

展开 >

苏州大学计算机科学与技术学院 江苏苏州 215006

人工智能 强化学习 深度强化学习 离线强化学习 批量强化学习

2025

计算机学报
中国计算机学会 中国科学院计算技术研究所

计算机学报

北大核心
影响因子:3.18
ISSN:0254-4164
年,卷(期):2025.48(1)