首页|DDPG算法下机械臂避障轨迹序列模式挖掘仿真

DDPG算法下机械臂避障轨迹序列模式挖掘仿真

扫码查看
机械臂在运动过程中,会产生大量的轨迹数据,由于传感器误差、环境不稳定性和其它因素的影响,采集到的机械臂轨迹数据包含噪声和不确定性,以上干扰会对模式挖掘的精度造成影响,使得成功样本提取变得困难。为解决上述问题,提出基于DDPG的机械臂避障轨迹序列模式挖掘方法。通过对机械臂避障问题分析,获取避障轨迹序列模式挖掘的根本目标,选择深度确定性策略梯度(Deep Deterministic Policy Gradient,DDPG)作为挖掘机械臂避障轨迹序列模式的基础算法,并为其设计奖励函数以提升算法收敛性,将Sum Tree引入DDPG的经验回放之中,建立加权采样DDPG,实现机械臂最优避障轨迹序列模式挖掘。实验结果表明,所提方法的挖掘成功率在 96%以上、挖掘时间在2ms内,且有效提高累积奖励均值。
Simulation of Sequential Pattern Mining for Obstacle Avoidance Trajectory of Robotic Arms under DDPG Algorithm
During the motion of a manipulator,a large amount of trajectory data is generated.Due to sensor error,environmental instability and other factors,the collected trajectory data may contain noise and uncertainty.These in-terferences can affect the accuracy of pattern mining and make successful sample extraction difficult.Therefore,based on DDPG,a method of mining the sequence pattern of obstacle avoidance trajectory of the manipulator was proposed.After analyzing the problem of obstacle avoidance,the fundamental goal of mining obstacle avoidance trajectory se-quence mode was obtained.Then,we used Deep Deterministic Policy Gradient(DDPG)as the basic algorithm for min-ing obstacle avoidance trajectory sequence mode.Meanwhile,we designed a reward function to improve the conver-gence of the algorithm.Moreover,we introduced Sum Tree into the experience replay of DDPG to establish a weighted sampling DDPG,thus realizing the optimal pattern mining of the manipulator.Experimental results show that the suc-cess rate of the proposed method is over 96%,and the mining time is within 2ms.Meanwhile,the mean value of cu-mulative rewards is effectively improved.

Deep Deterministic Policy Gradient DDPGObstacle avoidance of manipulatorTrajectory sequence modeReward function

李路可、杨杰

展开 >

郑州工商学院工学院,河南 郑州 451400

华北水利水电大学机械学院,河南 郑州 450045

深度确定性策略梯度 机械臂避障 轨迹序列模式 奖励函数

2024

计算机仿真
中国航天科工集团公司第十七研究所

计算机仿真

CSTPCD
影响因子:0.518
ISSN:1006-9348
年,卷(期):2024.41(11)