上海师范大学学报(自然科学版)2024,Vol.53Issue(2) :260-267.DOI:10.3969/J.ISSN.1000-5137.2024.02.018

基于深度强化学习分层控制的双足机器人多模式步态系统研究

Research on multi-mode gait hierarchical control system of biped robot based on hierarchical control of deep reinforcement learning

徐毓松 上官倩芡 安康
上海师范大学学报(自然科学版)2024,Vol.53Issue(2) :260-267.DOI:10.3969/J.ISSN.1000-5137.2024.02.018

基于深度强化学习分层控制的双足机器人多模式步态系统研究

Research on multi-mode gait hierarchical control system of biped robot based on hierarchical control of deep reinforcement learning

徐毓松 1上官倩芡 1安康1
扫码查看

作者信息

  • 1. 上海师范大学 信息与机电工程学院,上海 201418
  • 折叠

摘要

提出一种基于深度强化学习(DRL)分层控制的双足机器人多模式步态生成系统.首先采用优势型演员-评论家框架作为高级控制策略,引入近端策略优化(PPO)算法、课程学习(CL)思想对策略进行优化,设计比例-微分(PD)控制器为低级控制器;然后定义机器人观测和动作空间进行策略参数化,并根据对称双足行走步态周期性的特点,设计步态周期奖励函数和步进函数;最后通过生成足迹序列,设计多模式任务场景,并在Mujoco仿真平台下验证方法的可行性.结果表明,本方法能够有效提高双足机器人在复杂环境下行走的稳定性以及泛化性.

Abstract

According to the current research in the application of bipedal robot gait control,there still existed deficiency and challenge related to stability and generalization in complex scenarios.A multi-mode bipedal robot gait generation system based on hierarchical control using deep reinforcement learning(DRL)was proposed.Initially,an advantage-actor-critic framework was employed as the high-level control strategy,integrating proximal policy optimization(PPO)algorithm and the concept of curriculum learning(CL)to optimize the policy.A proportional-differential(PD)controller was designed as the low-level controller.Next,the robot's observation and action spaces were defined for policy parameterization.Leveraging the cyclic nature of symmetric bipedal walking gaits,a gait cycle reward function and stepping function were devised.Finally,by generating footstep sequences,multiple-mode task scenarios were formulated,and the feasibility of the method was validated using the Mujoco simulation platform.The results demonstrated that the improved approach effectively enhanced the stability and generalization of bipedal robot walking in complex environments.

关键词

双足机器人/步态规划/近端策略优化(PPO)/多模式任务/课程学习(CL)

Key words

bipedal robot/gait planning/proximal policy optimization(PPO)/multimodal task/course learning(CL)

引用本文复制引用

出版年

2024
上海师范大学学报(自然科学版)
上海师范大学

上海师范大学学报(自然科学版)

影响因子:0.255
ISSN:1000-5137
参考文献量9
段落导航相关论文