Reinforcement Learning Navigation Method Based on Advantage Hindsight Experience Replay
Reinforcement learning demonstrates significant potential in the field of mobile robots.By combining reinforcement learning algorithms with robot navigation,the autonomous piloting of robots can be achieved without prior knowledge.However,robot reinforcement learning is associated with some disadvantages,such as low sample utilization ratios and poor generalization ability.Hence,based on the D3QN algorithm,this paper proposes an advantage hindsight experience replay algorithm for the playback of experience samples.First,the advantage function value of trajectory points in trajectory samples is calculated,and the point with the maximum advantage function is selected as the target point.Subsequently,the trajectory samples are relabeled,and the old and new trajectory samples are placed simultaneously into the experience pool to increase the diversity of experience samples,thus allowing the agent to learn to navigate to the target point more efficiently by learning the failed experience samples.To assess the validity of the proposed approach,different experimental environments are established using the Gazebo platform,and a TurtleBot3 robot is used to conduct navigation training and transfer tests in the simulation environment.The results show that the navigation success rate in the training environment is higher than that yielded by the current mainstream algorithm,and that the maximum navigation success rate achieved in the transfer test environment is 86.33%.Improving the algorithm can enhance the utilization ratio of navigation samples,reduce the difficulty of learning navigation strategies,and enhance the autonomous navigation ability and migration generalization ability of the robot in different environments.