具有稳定性保证的基于模型与条件神经过程的强化学习算法

扫码查看

原文链接

万方数据
维普

中文摘要：安全性是强化学习智能控制与决策方法大规模推广应用的必需能力.本文旨在实现智能体只通过安全策略(即将智能体动作限制在安全区域内的策略)收集数据进行训练,并使得最终训练的策略具有安全性保证.针对上述需求,研究拟采用控制理论中稳定性分析的方法,在一致最终有界性约束下优化强化学习训练的策略.具体而言,本方法提出了一类学习系统动态模型和李雅普诺夫函数的有效方法,在不将智能体驱动到安全区域之外的前提下,利用其实现闭环系统的稳定性分析.此外,本文给出了如何在提高策略性能的同时逐步扩大安全区域的方法,并在此基础上,给出了一种实用有效的算法来保证闭环系统在训练中及训练后的策略稳定性.最终,本文通过倒立摆对理论结果进行了仿真验证,即验证如何在倒立摆不倒下的情况下优化强化学习策略.

外文标题：Conditional neural processes for model-based reinforcement learning with stability guarantees

外文摘要：Safety is an essential property that enables the further extensive applications of reinforcement learning.This paper introduces a framework of safe model-based reinforcement learning by employing the classic Lyapunov methods(uniformly ultimate boundness)in control theory with safety guarantees during both training and deployment without the intervention mechanism.More specifically,an efficient way is presented to collect data and learn the dynamic models in a safe region defined by iterated Lyapunov functions.On this basis,this paper proposes a practical and effective algorithm capable of gradually expanding the safe region while improving the control performance.Finally,illustrative examples are given to demonstrate the necessity and the validity of the obtained policy on an inverted pendulum.

外文关键词：

reinforcement learningsafetystability analysisconditional neural processes

作者：

杨嘉楠、丁一航、朱益民、蔡博、马雨婷、李云鹏、韩铭昊

展开 >

作者单位：

哈尔滨工业大学航天学院,哈尔滨 150000

关键词：

强化学习安全性稳定性分析条件神经过程

基金：

国家自然科学基金黑龙江省自然科学基金黑龙江省博士后科学基金黑龙江省博士后科学基金

项目编号：

62003117YQ2022F0132020M681096LBH-Z20140

出版年：

2024

DOI：

10.1360/SST-2022-0321

中国科学(技术科学)

中国科学院

中国科学(技术科学)

CSTPCD北大核心

影响因子：0.752

ISSN：1674-7259

年,卷(期)：2024.54(2)

参考文献量25