计算机仿真2024,Vol.41Issue(11) :448-452.

DDPG算法下机械臂避障轨迹序列模式挖掘仿真

Simulation of Sequential Pattern Mining for Obstacle Avoidance Trajectory of Robotic Arms under DDPG Algorithm

李路可 杨杰
计算机仿真2024,Vol.41Issue(11) :448-452.

DDPG算法下机械臂避障轨迹序列模式挖掘仿真

Simulation of Sequential Pattern Mining for Obstacle Avoidance Trajectory of Robotic Arms under DDPG Algorithm

李路可 1杨杰2
扫码查看

作者信息

  • 1. 郑州工商学院工学院,河南 郑州 451400
  • 2. 华北水利水电大学机械学院,河南 郑州 450045
  • 折叠

摘要

机械臂在运动过程中,会产生大量的轨迹数据,由于传感器误差、环境不稳定性和其它因素的影响,采集到的机械臂轨迹数据包含噪声和不确定性,以上干扰会对模式挖掘的精度造成影响,使得成功样本提取变得困难.为解决上述问题,提出基于DDPG的机械臂避障轨迹序列模式挖掘方法.通过对机械臂避障问题分析,获取避障轨迹序列模式挖掘的根本目标,选择深度确定性策略梯度(Deep Deterministic Policy Gradient,DDPG)作为挖掘机械臂避障轨迹序列模式的基础算法,并为其设计奖励函数以提升算法收敛性,将Sum Tree引入DDPG的经验回放之中,建立加权采样DDPG,实现机械臂最优避障轨迹序列模式挖掘.实验结果表明,所提方法的挖掘成功率在 96%以上、挖掘时间在2ms内,且有效提高累积奖励均值.

Abstract

During the motion of a manipulator,a large amount of trajectory data is generated.Due to sensor error,environmental instability and other factors,the collected trajectory data may contain noise and uncertainty.These in-terferences can affect the accuracy of pattern mining and make successful sample extraction difficult.Therefore,based on DDPG,a method of mining the sequence pattern of obstacle avoidance trajectory of the manipulator was proposed.After analyzing the problem of obstacle avoidance,the fundamental goal of mining obstacle avoidance trajectory se-quence mode was obtained.Then,we used Deep Deterministic Policy Gradient(DDPG)as the basic algorithm for min-ing obstacle avoidance trajectory sequence mode.Meanwhile,we designed a reward function to improve the conver-gence of the algorithm.Moreover,we introduced Sum Tree into the experience replay of DDPG to establish a weighted sampling DDPG,thus realizing the optimal pattern mining of the manipulator.Experimental results show that the suc-cess rate of the proposed method is over 96%,and the mining time is within 2ms.Meanwhile,the mean value of cu-mulative rewards is effectively improved.

关键词

深度确定性策略梯度/机械臂避障/轨迹序列模式/奖励函数

Key words

Deep Deterministic Policy Gradient DDPG/Obstacle avoidance of manipulator/Trajectory sequence mode/Reward function

引用本文复制引用

出版年

2024
计算机仿真
中国航天科工集团公司第十七研究所

计算机仿真

CSTPCD
影响因子:0.518
ISSN:1006-9348
段落导航相关论文