During the motion of a manipulator,a large amount of trajectory data is generated.Due to sensor error,environmental instability and other factors,the collected trajectory data may contain noise and uncertainty.These in-terferences can affect the accuracy of pattern mining and make successful sample extraction difficult.Therefore,based on DDPG,a method of mining the sequence pattern of obstacle avoidance trajectory of the manipulator was proposed.After analyzing the problem of obstacle avoidance,the fundamental goal of mining obstacle avoidance trajectory se-quence mode was obtained.Then,we used Deep Deterministic Policy Gradient(DDPG)as the basic algorithm for min-ing obstacle avoidance trajectory sequence mode.Meanwhile,we designed a reward function to improve the conver-gence of the algorithm.Moreover,we introduced Sum Tree into the experience replay of DDPG to establish a weighted sampling DDPG,thus realizing the optimal pattern mining of the manipulator.Experimental results show that the suc-cess rate of the proposed method is over 96%,and the mining time is within 2ms.Meanwhile,the mean value of cu-mulative rewards is effectively improved.
关键词
深度确定性策略梯度/机械臂避障/轨迹序列模式/奖励函数
Key words
Deep Deterministic Policy Gradient DDPG/Obstacle avoidance of manipulator/Trajectory sequence mode/Reward function