Deep reinforcement learning navigation algorithm combining advantage structure and minimum target Q-value
The existing deep reinforcement learning methods based on the policy gradients have the problems of long training time and low learning efficiency when they are applied to robot navigation in complex indoor scenes such as offices and corridors.This paper proposes a deep reinforcement learning navigation algorithm which combines the advantage structure and minimizing the target Q value.The algorithm introduces the advantage structure into the deep reinforcement learning method based on the policy gradient to distinguish the action difference under the same state value and improve the learning efficiency.In the multi-target navigation scenario,the method estimates the state value separately to provide more accurate value judgment by using map information.The mitigation over estimation method for discrete control is difficult to work in the mainstream Actor-Critic framework,a minimum target Q-value method based on the Gaussian smoothing is designed to reduce the influence of over estimation on training,The experimental results show that the algorithm in this paper can effectively speed up the learning rate.In the process of single-target and multi-target continuous navigation training,the convergence speed of our method is better than that of SAC,TD3,and DDPG.The trained agent makes the robot effectively away from obstacles and has a good generalization ability.