首页|融合元学习和PPO算法的四足机器人运动技能学习方法

融合元学习和PPO算法的四足机器人运动技能学习方法

扫码查看
具备学习能力是高等动物智能的典型表现特征,为探明四足动物运动技能学习机理,本文对四足机器人步态学习任务进行研究,复现了四足动物的节律步态学习过程。近年来,近端策略优化(PPO)算法作为深度强化学习的典型代表,普遍被用于四足机器人步态学习任务,实验效果较好且仅需较少的超参数。然而,在多维输入输出场景下,其容易收敛到局部最优点,表现为四足机器人学习到步态节律信号杂乱且重心震荡严重。为解决上述问题,在元学习启发下,基于元学习具有刻画学习过程高维抽象表征优势,本文提出了一种融合元学习和PPO思想的元近端策略优化(MPPO)算法,该算法可以让四足机器人进化学习到更优步态。在PyBullet仿真平台上的仿真实验结果表明,本文提出的算法可以使四足机器人学会行走运动技能,且与柔性行动者评价器(SAC)和PPO算法的对比实验显示,本文提出的MPPO算法具有步态节律信号更规律、行走速度更快等优势。
A quadruped robot kinematic skill learning method integrating meta-learning and PPO algorithms
Learning ability is a typical characteristic of higher animal intelligence.In order to explore the learning mechanism of quadruped motor skills,this paper studies the gait learning task of quadruped robots,and reproduces the rhythmic gait learning process of quadruped animals from scratch.In recent years,proximal policy optimization(PPO)algorithm,as a typical representative algorithm of deep reinforcement learning,has been widely used in gait learning tasks for quadruped robots,with good experimental results and fewer hyperparameters required.However,in the multi-dimensional input and output scenario,it is easy to converge to the local optimum point,in the experimental environment of this study,the gait rhythm signals of the trained quadruped robot were irregular,and the center of gravity oscillates.To solve the above problems,inspired by meta-learning,based on the advantage of meta-learning in characterizing the high-dimensional abstract representation of learning processes,this paper proposes an meta proximal policy optimization(MPPO)algorithm that combines meta-learning and PPO algorithms.This algorithm can enable quadruped robots to learn better gait.The simulation results on the PyBullet simulation platform show that the algorithm proposed in this paper can enable quadruped robots to learn walking skills.Compared with soft actor-critic(SAC)and PPO algorithms,the MPPO algorithm proposed in this paper has advantages such as more regular gait rhythm signals and faster walking speed.

quadruped robotgait learningreinforcement learningmeta-learning

朱晓庆、刘鑫源、阮晓钢、张思远、李春阳、李鹏

展开 >

北京工业大学信息学部,北京100020

计算智能与智能系统北京市重点实验室,北京100020

四足机器人 步态学习 强化学习 元学习

国家自然科学基金北京市自然科学基金

621030094202005

2024

控制理论与应用
华南理工大学 中国科学院数学与系统科学研究院

控制理论与应用

CSTPCD北大核心
影响因子:1.076
ISSN:1000-8152
年,卷(期):2024.41(1)
  • 32