高技术通讯(英文版)2024,Vol.30Issue(4) :344-355.DOI:10.3772/j.issn.1006-6748.2024.04.002

SPaRM:an efficient exploration and planning framework for sparse reward reinforcement learning

班健 LI Gongyan XU Shaoyun
高技术通讯(英文版)2024,Vol.30Issue(4) :344-355.DOI:10.3772/j.issn.1006-6748.2024.04.002

SPaRM:an efficient exploration and planning framework for sparse reward reinforcement learning

班健 1LI Gongyan 2XU Shaoyun2
扫码查看

作者信息

  • 1. Institute of Microelectronics,Chinese Academy of Sciences,Beijing 100029,P.R.China;University of Chinese Academy of Sciences,Beijing 100049,P.R.China
  • 2. Institute of Microelectronics,Chinese Academy of Sciences,Beijing 100029,P.R.China
  • 折叠

Abstract

Due to the issue of long-horizon,a substantial number of visits to the state space is required during the exploration phase of reinforcement learning(RL)to gather valuable information.Addi-tionally,due to the challenge posed by sparse rewards,the planning phase of reinforcement learning consumes a considerable amount of time on repetitive and unproductive tasks before adequately ac-cessing sparse reward signals.To address these challenges,this work proposes a space partitioning and reverse merging(SPaRM)framework based on reward-free exploration(RFE).The framework consists of two parts:the space partitioning module and the reverse merging module.The former module partitions the entire state space into a specific number of subspaces to expedite the explora-tion phase.This work establishes its theoretical sample complexity lower bound.The latter module starts planning in reverse from near the target and gradually extends to the starting state,as opposed to the conventional practice of starting at the beginning.This facilitates the early involvement of sparse rewards at the target in the policy update process.This work designs two experimental envi-ronments:a complex maze and a set of randomly generated maps.Compared with two state-of-the-art(SOTA)algorithms,experimental results validate the effectiveness and superior performance of the proposed algorithm.

Key words

reinforcement learning(RL)/sparse reward/reward-free exploration(RFE)/space partitioning(SP)/reverse merging(RM)

引用本文复制引用

出版年

2024
高技术通讯(英文版)
中国科学技术信息研究所(ISTIC)

高技术通讯(英文版)

影响因子:0.058
ISSN:1006-6748
段落导航相关论文