首页|SPaRM:an efficient exploration and planning framework for sparse reward reinforcement learning

SPaRM:an efficient exploration and planning framework for sparse reward reinforcement learning

扫码查看
SPaRM:an efficient exploration and planning framework for sparse reward reinforcement learning
Due to the issue of long-horizon,a substantial number of visits to the state space is required during the exploration phase of reinforcement learning(RL)to gather valuable information.Addi-tionally,due to the challenge posed by sparse rewards,the planning phase of reinforcement learning consumes a considerable amount of time on repetitive and unproductive tasks before adequately ac-cessing sparse reward signals.To address these challenges,this work proposes a space partitioning and reverse merging(SPaRM)framework based on reward-free exploration(RFE).The framework consists of two parts:the space partitioning module and the reverse merging module.The former module partitions the entire state space into a specific number of subspaces to expedite the explora-tion phase.This work establishes its theoretical sample complexity lower bound.The latter module starts planning in reverse from near the target and gradually extends to the starting state,as opposed to the conventional practice of starting at the beginning.This facilitates the early involvement of sparse rewards at the target in the policy update process.This work designs two experimental envi-ronments:a complex maze and a set of randomly generated maps.Compared with two state-of-the-art(SOTA)algorithms,experimental results validate the effectiveness and superior performance of the proposed algorithm.

reinforcement learning(RL)sparse rewardreward-free exploration(RFE)space partitioning(SP)reverse merging(RM)

班健、LI Gongyan、XU Shaoyun

展开 >

Institute of Microelectronics,Chinese Academy of Sciences,Beijing 100029,P.R.China

University of Chinese Academy of Sciences,Beijing 100049,P.R.China

reinforcement learning(RL) sparse reward reward-free exploration(RFE) space partitioning(SP) reverse merging(RM)

2024

高技术通讯(英文版)
中国科学技术信息研究所(ISTIC)

高技术通讯(英文版)

影响因子:0.058
ISSN:1006-6748
年,卷(期):2024.30(4)