Conditional neural processes for model-based reinforcement learning with stability guarantees
Safety is an essential property that enables the further extensive applications of reinforcement learning.This paper introduces a framework of safe model-based reinforcement learning by employing the classic Lyapunov methods(uniformly ultimate boundness)in control theory with safety guarantees during both training and deployment without the intervention mechanism.More specifically,an efficient way is presented to collect data and learn the dynamic models in a safe region defined by iterated Lyapunov functions.On this basis,this paper proposes a practical and effective algorithm capable of gradually expanding the safe region while improving the control performance.Finally,illustrative examples are given to demonstrate the necessity and the validity of the obtained policy on an inverted pendulum.