Time optimal trajectory planning of excavator based on deep reinforcement learning
Aiming at the autonomous operation scenarios of excavators,a time optimal trajectory planning method based on reinforcement learning is proposed.This method builds a simulation environment to generate data.The angle and velocity of the boom,arm and bucket joints are used as state observation variables,and the angle acceleration of each joint is used as action information,and the simulation environment and autonomous learning are realized through the state observation information.The interaction of the algorithm is designed to train the policy network parameters using whether the joint motion of the boom,arm and bucket exceeds the allowable range,the total time to complete the task and the relative distance of the target as the reward function to train the policy network parameters.Finally,using the improved proximal policy optimization(PPO)realizes the time optimal trajectory planning of the excavator.At the same time,compared with the results of the different reinforcement learning algorithms with continuous action spaces,the experimental results show that the proposed optimization algorithm has higher efficiency,faster convergence speed,and smoother operation trajectory,which can effectively avoid the large impact on each joint and contribute to the efficient and stable operation of the excavator.