首页|基于鲁棒交叉熵与梯度优化的安全强化学习方法

基于鲁棒交叉熵与梯度优化的安全强化学习方法

扫码查看
智能体在复杂环境下执行任务时,如何保证安全性和效率性是一个很大的难题.传统强化学习方法解决智能体决策问题时采用无模型的强化学习,利用大量数据不断试错寻找最优策略,忽略了智能体的训练成本和安全风险,因此无法有效保证决策的安全性.为此,在模型预测控制框架下对智能体动作添加安全约束条件,设计安全强化学习算法获得最安全的动作控制序列.同时,针对交叉熵方法存在计算量大与效率低、梯度优化方法存在着陷入局部最优的问题,结合鲁棒交叉熵与梯度优化方法优化动作控制序列,以提升算法安全性和求解效率.实验表明,所提方法相较于鲁棒交叉熵法能有效提升收敛速度,相较于其他优化算法在不损失较多性能的前提下安全性能最优.
Safe Reinforcement Learning Method Based on Robust Cross-Entropy and Gradient Optimization
Ensuring security and efficiency when intelligent agents perform tasks in complex environments is a major challenge.Traditional re-inforcement learning methods use model free reinforcement learning to solve intelligent decision-making problems,constantly trial and error to find the optimal strategy using a large amount of data,ignoring the training cost and security risks of the agent,and therefore cannot effectively ensure the safety of decision-making.To this end,safety constraints are added to the actions of intelligent agents in the model predictive con-trol framework,and a safety reinforcement learning algorithm is designed to obtain the safest action control sequence.At the same time,in re-sponse to the problems of high computational complexity and low efficiency in the cross entropy method,as well as the problem of falling into local optima in the gradient optimization method,a combination of robust cross entropy and gradient optimization methods is used to optimize the action control sequence to improve algorithm safety and solving efficiency.The experiment shows that the proposed method can effectively improve the convergence speed compared to the robust cross entropy method,and has the best safety performance compared to other optimiza-tion algorithms without sacrificing much performance.

reinforcement learningrobust cross-entropygradient optimizationsafety

周娴玮、张锟、叶鑫

展开 >

华南师范大学 软件学院,广东 佛山 538200

强化学习 鲁棒交叉熵 梯度优化 安全性

2024

软件导刊
湖北省信息学会

软件导刊

影响因子:0.524
ISSN:1672-7800
年,卷(期):2024.23(9)