首页|Research on Gait Switching Method Based on Speed Requirement

Research on Gait Switching Method Based on Speed Requirement

扫码查看
Real-time gait switching of quadruped robot with speed change is a difficult problem in the field of robot research.It is a novel solution to apply reinforcement learning method to the quadruped robot problem.In this paper,a quadruped robot simulation platform is built based on Robot Operating System(ROS).openai-gym is used as the RL framework,and Proximal Policy Optimization(PPO)algorithm is used for quadruped robot gait switching.The training task is to train different gait parameters according to different speed input,including gait type,gait cycle,gait offset,and gait interval.Then,the trained gait parameters are used as the input of the Model Predictive Control(MPC)controller,and the joint forces/torques are calculated by the MPC controller.The calculated joint forces are transmitted to the joint motor of the quadruped robot to control the joint rotation,and the gait switching of the quadruped robot under different speeds is realized.Thus,it can more realistically imitate the gait transformation of animals,walking at very low speed,trotting at medium speed and galloping at high speed.In this paper,a variety of factors affecting the gait training of quadruped robot are integrated,and many aspects of reward constraints are used,including velocity reward,time reward,energy reward and balance reward.Different weights are given to each reward,and the instant reward at each step of system training is obtained by multiplying each reward with its own weight,which ensures the reliability of training results.At the same time,multiple groups of comparative analysis simulation experiments are carried out.The results show that the priority of balance reward,velocity reward,energy reward and time reward decreases successively and the weight of each reward does not exceed 0.5.When the policy network and the value network are designed,a three-layer neural network is used,the number of neurons in each layer is 64 and the discount factor is 0.99,the training effect is better.

Gait swtichingReinforcement learningProximal policy optimizationMPC controller

Weijun Tian、Kuiyue Zhou、Jian Song、Xu Li、Zhu Chen、Ziteng Sheng、Ruizhi Wang、Jiang Lei、Qian Cong

展开 >

Key Labotatory of Bionic Engineering of Ministry of Education,Jilin University,Changchun 130022,China

North-Vehicle Research,Fengtai District,Beijing 100072,China

2024

仿生工程学报(英文版)
吉林大学

仿生工程学报(英文版)

CSTPCDEI
影响因子:0.837
ISSN:1672-6529
年,卷(期):2024.21(6)