首页|非线性二阶系统的多智能体强化学习行为控制

非线性二阶系统的多智能体强化学习行为控制

扫码查看
强化学习行为控制局限于没有群体任务的单个智能体,因为其将行为优先级学习建模为马尔可夫决策过程.本文提出一种新颖的多智能体强化学习行为控制方法,该方法通过执行联合学习克服上述缺陷.具体而言,针对一组非线性二阶系统,设计一个多智能体强化学习任务监管器以在任务层分配行为优先级.通过将行为优先级切换建模为协作式马尔可夫博弈,多智能体强化学习任务监管器学习最优联合行为优先级,以减少对人类智能和高性能计算硬件的依赖.在控制层,设计了一组二阶强化学习控制器用以学习最优控制策略,实现位置和速度信号的同步跟踪.特别地,设计了一组自适应补偿器以保证输入饱和约束.数值仿真结果验证了所提出的多智能体强化学习行为控制对比有限时间、固有时间和强化学习行为控制具有更低的切换频率和控制代价.
Multi-agent reinforcement learning behavioral control for nonlinear second-order systems
Reinforcement learning behavioral control(RLBC)is limited to an individual agent without any swarm mission,because it models the behavior priority learning as a Markov decision process.In this paper,a novel multi-agent reinforcement learning behavioral control(MARLBC)method is proposed to overcome such limitations by implementing joint learning.Specifically,a multi-agent reinforcement learning mission supervisor(MARLMS)is designed for a group of nonlinear second-order systems to assign the behavior priorities at the decision layer.Through modeling behavior priority switching as a cooperative Markov game,the MARLMS learns an optimal joint behavior priority to reduce dependence on human intelligence and high-performance computing hardware.At the control layer,a group of second-order reinforcement learning controllers are designed to learn the optimal control policies to track position and velocity signals simultaneously.In particular,input saturation constraints are strictly implemented via designing a group of adaptive compensators.Numerical simulation results show that the proposed MARLBC has a lower switching frequency and control cost than finite-time and fixed-time behavioral control and RLBC methods.

Reinforcement learningBehavioral controlSecond-order systemsMission supervisor

张祯毅、黄捷、潘聪捷

展开 >

福州大学电气工程与自动化学院,中国 福州市,350108

福州大学5G+工业互联网研究院,中国 福州市,350108

强化学习 行为控制 二阶系统 任务监管

National Natural Science Foundation of China

92367109

2024

信息与电子工程前沿(英文)
浙江大学

信息与电子工程前沿(英文)

CSTPCD
影响因子:0.371
ISSN:2095-9184
年,卷(期):2024.25(6)
  • 3