首页|改进TD3算法的机械臂三维路径规划方法

改进TD3算法的机械臂三维路径规划方法

扫码查看
在军事航空领域中,复杂任务对机械臂路径规划提出了挑战.针对双延迟深度确定性策略梯度(TD3)算法学习效率低、样本利用率低的问题,提出了一种改进的TD3算法(Recurrent-TD3算法).首先,将LSTM结合到策略网络与价值网络中,捕获航空控制任务中的时间序列信息,增强对时间序列变化的响应能力,使其能够在决策时考虑历史动作和状态,提高网络的表达能力;然后,将事后经验回放(HER)技术集成到TD3算法中,以解决任务中稀疏奖励难以学习的问题,通过将未达到目标的经验转化为达到新目标的经验,从而更有效地利用样本;最后,设计了一种基于包围盒的碰撞检测流程,以提高机械臂在军用航空任务中的安全性.实验表明,该算法相比于其他算法能够更快地找到一条无碰撞的路径,且平均路径长度最短.
An Improved TD3 Algorithm for 3D Path Planning of Robotic Arm
In the area of military aviation,complicated tasks pose challenges to the path planning of robotic arms.To solve the problems of low learning efficiency and low sample utilization of Twin Delayed Deep Deterministic policy gradient(TD3)algorithm,an improved TD3 algorithm of Recurrent-TD3 is proposed.First-ly,Long Short Term Memory(LSTM)is integrated into strategy network and value network to capture time series information of aviation control tasks,enhance its response ability to time series changes,and enable it to consider historical actions and states in decision-making,and improve the representation ability of the network.Then,Hindsight Experience Replay(HER)is integrated into the TD3 algorithm to avoid the difficulty in learning the sparse rewards in tasks,thereby making more efficient use of the samples by converting the experience of not reaching the goals into the experience of reaching the new goal.Finally,a collision detection process based on the bounding box is designed to improve the safety of robotic armi military aviation missions.The experiments show that this method can find a collision-free path faster than other methods,and the average path length is the shortest.

robotic armpath planningTD3LSTMHER

马天、李超、杨嘉怡

展开 >

西安科技大学计算机科学与技术学院,西安 710000

机械臂 路径规则 TD3 长短期记忆网络 事后经验回放技术

2025

电光与控制
中国航空工业洛阳电光设备研究所

电光与控制

北大核心
影响因子:0.424
ISSN:1671-637X
年,卷(期):2025.32(1)