首页|基于D-DQN强化学习算法的双足机器人智能控制研究

基于D-DQN强化学习算法的双足机器人智能控制研究

扫码查看
针对现有双足机器人智能控制算法存在的轨迹偏差大、效率低等问题,提出了一种基于D-DQN强化学习的控制算法;先分析双足机器人运动中的坐标变换关系和关节连杆补偿过程,然后基于Q值网络实现对复杂运动非线性过程降维处理,采用了 Q值网络权值和辅助权值的双网络权值设计方式,进一步强化DQN网络性能,并以Tanh函数作为神经网络的激活函数,提升DQN网络的数值训练能力;在数据训练和交互中经验回放池发挥出关键的辅助作用,通过将奖励值输入到目标函数中,进一步提升对双足机器人的控制精度,最后通过虚拟约束控制的方式提高双足机器人运动中的稳定性;实验结果显示:在D-DQN强化学习的控制算法,机器人完成第一阶段测试的时间仅为115 s,综合轨迹偏差0。02 m,而且步态切换极限环测试的稳定性良好。
Research on Intelligent Control of Biped Robot Based on D-DQN Reinforcement Learning Algorithm
Aiming at the large trajectory deviation and low efficiency of existing intelligent control algorithms for biped robots,a control algorithm based on D-DQN reinforcement learning is proposed.Firstly,this paper analyzes the coordinate transformation rela-tionship in the biped robot motion and the compensation process of robot joints and links,and then achieves the dimension reduction of complex nonlinear motion process based on Q-value network.The double weight design method of Q-value network weight and auxil-iary weight is adopted to further strengthen the performance of DQN network,and the Tanh function is used as the activation function of neural network to improve the numerical training ability of DQN network.The experience playback pool plays a key auxiliary role in the data training and interaction.The reward value is input into the objective function to further improve the control accuracy of the biped robot.Finally,the virtual constraint control is used to improve the stability of the biped robot.The experimental results show that under the D-DQN reinforcement learning control algorithm,it takes only 115 s for the robot to complete the first stage test is on-ly 115 s,with a comprehensive trajectory deviation of 0.02 m,and the gait switching limit cycle test has a good stability.

D-DQNreinforcement learningbipedal robotintelligent controlexperience playback poolvirtual constraint con-trol

李丽霞、陈艳

展开 >

广州华商学院,广州 511300

D-DQN 强化学习 双足机器人 智能控制 经验回放池 虚拟约束控制

广州华商学院高等教育教学改革项目(2022)

HS2022ZLGC71

2024

计算机测量与控制
中国计算机自动测量与控制技术协会

计算机测量与控制

CSTPCD
影响因子:0.546
ISSN:1671-4598
年,卷(期):2024.32(3)
  • 20