A quadruped robot kinematic skill learning method integrating meta-learning and PPO algorithms
Learning ability is a typical characteristic of higher animal intelligence.In order to explore the learning mechanism of quadruped motor skills,this paper studies the gait learning task of quadruped robots,and reproduces the rhythmic gait learning process of quadruped animals from scratch.In recent years,proximal policy optimization(PPO)algorithm,as a typical representative algorithm of deep reinforcement learning,has been widely used in gait learning tasks for quadruped robots,with good experimental results and fewer hyperparameters required.However,in the multi-dimensional input and output scenario,it is easy to converge to the local optimum point,in the experimental environment of this study,the gait rhythm signals of the trained quadruped robot were irregular,and the center of gravity oscillates.To solve the above problems,inspired by meta-learning,based on the advantage of meta-learning in characterizing the high-dimensional abstract representation of learning processes,this paper proposes an meta proximal policy optimization(MPPO)algorithm that combines meta-learning and PPO algorithms.This algorithm can enable quadruped robots to learn better gait.The simulation results on the PyBullet simulation platform show that the algorithm proposed in this paper can enable quadruped robots to learn walking skills.Compared with soft actor-critic(SAC)and PPO algorithms,the MPPO algorithm proposed in this paper has advantages such as more regular gait rhythm signals and faster walking speed.