Robot Control Method Based on Double Limit Q Learning
Offline reinforcement learning,which has the advantage of training satisfactory results without the interaction between the agent and the environment,has been developing rapidly recently.To alleviate the problem of extrinsic error and too conservative offline reinforcement learning algorithm,this paper proposes an offline reinforcement learn-ing algorithm DIQL based on double restricted Q learning.The estimated value of the out-of-distribution(OOD)action of the restricted Q value network should not be too far from the estimated value of state V after data enhancement,and the limiting strategy should be adopted the mean square error of the generated OOD motion distance data set distribution should not be too large,and the algorithm exploration should be encouraged under the premise of double restrictions so that better results can be achieved even when the quality of the data set is poor.To verify the effec-tiveness of the algorithm,experiments are carried out in the gait control environment of a bipedal 6-DOF robot.The results show that the DIQL algorithm can effectively handle OOD actions and alleviate the problems of extrapolation error and over-conservative algorithm.