非线性二阶系统的多智能体强化学习行为控制

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：强化学习行为控制局限于没有群体任务的单个智能体,因为其将行为优先级学习建模为马尔可夫决策过程.本文提出一种新颖的多智能体强化学习行为控制方法,该方法通过执行联合学习克服上述缺陷.具体而言,针对一组非线性二阶系统,设计一个多智能体强化学习任务监管器以在任务层分配行为优先级.通过将行为优先级切换建模为协作式马尔可夫博弈,多智能体强化学习任务监管器学习最优联合行为优先级,以减少对人类智能和高性能计算硬件的依赖.在控制层,设计了一组二阶强化学习控制器用以学习最优控制策略,实现位置和速度信号的同步跟踪.特别地,设计了一组自适应补偿器以保证输入饱和约束.数值仿真结果验证了所提出的多智能体强化学习行为控制对比有限时间、固有时间和强化学习行为控制具有更低的切换频率和控制代价.

外文标题：Multi-agent reinforcement learning behavioral control for nonlinear second-order systems

外文摘要：Reinforcement learning behavioral control(RLBC)is limited to an individual agent without any swarm mission,because it models the behavior priority learning as a Markov decision process.In this paper,a novel multi-agent reinforcement learning behavioral control(MARLBC)method is proposed to overcome such limitations by implementing joint learning.Specifically,a multi-agent reinforcement learning mission supervisor(MARLMS)is designed for a group of nonlinear second-order systems to assign the behavior priorities at the decision layer.Through modeling behavior priority switching as a cooperative Markov game,the MARLMS learns an optimal joint behavior priority to reduce dependence on human intelligence and high-performance computing hardware.At the control layer,a group of second-order reinforcement learning controllers are designed to learn the optimal control policies to track position and velocity signals simultaneously.In particular,input saturation constraints are strictly implemented via designing a group of adaptive compensators.Numerical simulation results show that the proposed MARLBC has a lower switching frequency and control cost than finite-time and fixed-time behavioral control and RLBC methods.

外文关键词：

Reinforcement learningBehavioral controlSecond-order systemsMission supervisor

作者：

张祯毅、黄捷、潘聪捷

展开 >

作者单位：

福州大学电气工程与自动化学院,中国福州市,350108

福州大学5G+工业互联网研究院,中国福州市,350108

关键词：

强化学习行为控制二阶系统任务监管

基金：

National Natural Science Foundation of China

项目编号：

92367109

出版年：

2024

DOI：

10.1631/FITEE.2300394

信息与电子工程前沿(英文)

浙江大学

信息与电子工程前沿(英文)

CSTPCD

影响因子：0.371

ISSN：2095-9184

年,卷(期)：2024.25(6)

参考文献量3