Survey of Research on Offline Reinforcement Learning
Offline reinforcement learning,as an emerging paradigm,leverages a vast amount of offline data for learning without the need of active interactions with the environment.It demonstrates high potential and value,especially in high-risk fields such as health-care and autonomous driving.This review will sequentially unfold from the basic concepts of offline reinforcement learning,core issues,main methods,and focus on introducing various strategies to mitigate distributional shift.These include constraining target policy and behavior policy alignment,value function constraints,quantification of model uncertainty,and model-based offline reinforcement learn-ing methods.Finally,the article discusses current simulation environments for offline reinforcement learning and significant application scenarios.