A survey on model-based reinforcement learning

扫码查看

原文链接

万方数据
维普

外文摘要：Reinforcement learning(RL)interacts with the environment to solve sequential decision-making problems via a trial-and-error approach.Errors are always undesirable in real-world applications,even though RL excels at playing complex video games that permit several trial-and-error attempts.To improve sample efficiency and thus reduce errors,model-based reinforcement learning(MBRL)is believed to be a promising direction,as it constructs environment models in which trial-and-errors can occur without incurring actual costs.In this survey,we investigate MBRL with a particular focus on the recent advancements in deep RL.There is a generalization error between the learned model of a non-tabular environment and the actual environment.Consequently,it is crucial to analyze the disparity between policy training in the environment model and that in the actual environment,guiding algorithm design for improved model learning,model utilization,and policy training.In addition,we discuss the recent developments of model-based techniques in other forms of RL,such as offline RL,goal-conditioned RL,multi-agent RL,and meta-RL.Furthermore,we discuss the applicability and benefits of MBRL for real-world tasks.Finally,this survey concludes with a discussion of the promising future development prospects for MBRL.We believe that MBRL has great unrealized potential and benefits in real-world applications,and we hope this survey will encourage additional research on MBRL.

外文关键词：

reinforcement learningmodel-based reinforcement learningplanningmodel learningmodel learning with reduced errormodel usage

作者：

Fan-Ming LUO、Tian XU、Hang LAI、Xiong-Hui CHEN、Weinan ZHANG、Yang YU

展开 >

作者单位：

National Key Laboratory for Novel Software Technology,Nanjing University,Nanjing 210023,China

Polixir.ai,Nanjing 211106,China

Department of Computer Science and Engineering,Shanghai Jiao Tong University,Shanghai 200240,China

基金：

National Key Research and Development Program of ChinaNational Natural Science Foundation of ChinaNational Natural Science Foundation of China

项目编号：

2020AAA01072006187607762076161

出版年：

2024

DOI：

10.1007/s11432-022-3696-5

中国科学:信息科学(英文版)

中国科学院

中国科学:信息科学(英文版)

CSTPCDEI

影响因子：0.715

ISSN：1674-733X

年,卷(期)：2024.67(2)

参考文献量201