Safe Reinforcement Learning Method Based on Robust Cross-Entropy and Gradient Optimization
Ensuring security and efficiency when intelligent agents perform tasks in complex environments is a major challenge.Traditional re-inforcement learning methods use model free reinforcement learning to solve intelligent decision-making problems,constantly trial and error to find the optimal strategy using a large amount of data,ignoring the training cost and security risks of the agent,and therefore cannot effectively ensure the safety of decision-making.To this end,safety constraints are added to the actions of intelligent agents in the model predictive con-trol framework,and a safety reinforcement learning algorithm is designed to obtain the safest action control sequence.At the same time,in re-sponse to the problems of high computational complexity and low efficiency in the cross entropy method,as well as the problem of falling into local optima in the gradient optimization method,a combination of robust cross entropy and gradient optimization methods is used to optimize the action control sequence to improve algorithm safety and solving efficiency.The experiment shows that the proposed method can effectively improve the convergence speed compared to the robust cross entropy method,and has the best safety performance compared to other optimiza-tion algorithms without sacrificing much performance.