首页|面向稀疏奖励的机器人操作技能学习

面向稀疏奖励的机器人操作技能学习

扫码查看
基于深度强化学习的机器人操作技能学习成为研究热点,但由于任务的稀疏奖励性质,学习效率较低。本文提出了基于元学习的双经验池自适应软更新事后经验回放方法,并将其应用于稀疏奖励的机器人操作技能学习问题求解。首先,在软更新事后经验回放算法的基础上推导出可以提高算法效率的精简值函数,并加入温度自适应调整策略,动态调整温度参数以适应不同的任务环境;其次,结合元学习思想对经验回放进行分割,训练时动态调整选取真实采样数据和构建虚拟数的比例,提出了DAS-HER方法;然后,将DAS-HER算法应用到机器人操作技能学习中,构建了一个稀疏奖励环境下具有通用性的机器人操作技能学习框架;最后,在Mujoco下的Fetch和Hand环境中,进行了8项任务的对比实验,实验结果表明,无论是在训练效率还是在成功率方面,本文算法表现均优于其他算法。
Robot manipulation skills learning for sparse rewards
Robot manipulation skills learning based on deep reinforcement learning has become a research hotspot.However,due to the sparse reward nature of robot manipulation skills learning,the learning efficiency is low.In this paper,a double experience replay buffer adaptive soft hindsight experience replay(DAS-HER)algorithm based on meta-learning is proposed,and applied to solve the manipulation skills learning problem with sparse reward.Firstly,based on the soft hindsight experience replay(SHER)algorithm,a simplified value function which can improve the efficiency of the algorithm is derived,and a temperature adaptive adjustment strategy is introduced which can dynamically adjust the temperature parameters to adapt to different task environments.Secondly,combined with meta-learning,the experience replay is segmented,dynamically adjust the ratio of real sampling data and construct virtual data during training,and the DAS-HER algorithm is proposed.Thirdly,a generalized framework for robot manipulation skills learning under a sparse reward environment is constructed,and DAS-HER algorithm is applied to robot manipulation skills learning.Finally,comparative experiments for eight tasks are conducted both in Fetch and Hand environments under Mujoco environment,and the results show that the proposed algorithms outperform other algorithms in terms of training efficiency and success rate.

robot manipulation skills learningreinforcement learningsparse rewardmaximum entropy methodsadaptive temperature parametersmeta-learning

吴培良、张彦、毛秉毅、陈雯柏、高国伟

展开 >

燕山大学信息科学与工程学院,河北秦皇岛 066004

河北省计算机虚拟技术与系统集成重点实验室,河北秦皇岛 066004

北京信息科技大学自动化学院,北京 100192

机器人操作技能学习 强化学习 稀疏奖励 最大熵方法 自适应温度参数 元学习

国家重点研发计划国家自然科学基金区域联合基金北京市自然科学基金河北省自然科学基金

2018YFB1308300U20A201674202026F202103079

2024

控制理论与应用
华南理工大学 中国科学院数学与系统科学研究院

控制理论与应用

CSTPCD北大核心
影响因子:1.076
ISSN:1000-8152
年,卷(期):2024.41(1)
  • 18