摘要
安全性是强化学习智能控制与决策方法大规模推广应用的必需能力.本文旨在实现智能体只通过安全策略(即将智能体动作限制在安全区域内的策略)收集数据进行训练,并使得最终训练的策略具有安全性保证.针对上述需求,研究拟采用控制理论中稳定性分析的方法,在一致最终有界性约束下优化强化学习训练的策略.具体而言,本方法提出了一类学习系统动态模型和李雅普诺夫函数的有效方法,在不将智能体驱动到安全区域之外的前提下,利用其实现闭环系统的稳定性分析.此外,本文给出了如何在提高策略性能的同时逐步扩大安全区域的方法,并在此基础上,给出了 一种实用有效的算法来保证闭环系统在训练中及训练后的策略稳定性.最终,本文通过倒立摆对理论结果进行了仿真验证,即验证如何在倒立摆不倒下的情况下优化强化学习策略.
Abstract
Safety is an essential property that enables the further extensive applications of reinforcement learning.This paper introduces a framework of safe model-based reinforcement learning by employing the classic Lyapunov methods(uniformly ultimate boundness)in control theory with safety guarantees during both training and deployment without the intervention mechanism.More specifically,an efficient way is presented to collect data and learn the dynamic models in a safe region defined by iterated Lyapunov functions.On this basis,this paper proposes a practical and effective algorithm capable of gradually expanding the safe region while improving the control performance.Finally,illustrative examples are given to demonstrate the necessity and the validity of the obtained policy on an inverted pendulum.