基于零和博弈的四旋翼无人机强化学习容错跟踪控制

Zero-sum Game-based Fault-tolerant Tracking Control of Quadrotor Unmanned Aerial Vehicle Using Reinforcement Learning

扫码查看

原文链接

维普
万方数据

中文摘要：针对在欺骗攻击下具有未知动力学的四旋翼无人机轨迹跟踪问题,开展了一种基于零和博弈框架的强化学习容错控制策略研究.首先,依据四旋翼无人机的系统模型和中间控制律,建立了系统的误差动力学.随后,在零和博弈框架下,设计了控制输入与欺骗攻击的对抗策略,通过最小化代价函数,确保四旋翼无人机在面对欺骗攻击时能够实现有效的容错控制.接着,开发了基于强化学习的演员-评论家神经网络算法,动态调整策略以达到零和博弈的纳什均衡.通过稳定性分析,证明了在该控制算法下,闭环系统中所有信号均保持有界.最后,仿真实验验证了所提基于零和博弈的强化学习容错轨迹跟踪控制算法的有效性和适应性,且方案使容错性能提升了10%.

外文摘要：This paper investigates the trajectory tracking problem of quadrotor unmanned aerial vehicle(UAV)with unknown dynamics under deception attacks by proposing a fault-tolerant control strategy based on a zero-sum game framework and reinforcement learning.Firstly,the system's error dynamics are established based on the quadrotor UAV model and the intermediary control law.Then,within the zero-sum game framework,adversarial strategies for both control input and deception attacks are designed,with the cost function minimized to ensure effective fault-tolerant control in the presence of deception attacks.Subsequently,an actor-critic neural network algorithm based on reinforcement learning is developed to dynamically update the strategies,achieving the Nash equilibrium of the zero-sum game.Stability analysis demonstrates that all signals in the closed-loop system remain bounded under the proposed control algorithm.Finally,simulation results validate the effectiveness and adaptability of the proposed fault-tolerant trajectory tracking control algorithm based on the zero-sum game and reinforcement learning,which improves fault tolerance performance by 10%.

外文关键词：

Quadrotor Unmanned Aerial VehicleTrajectory TrackingZero-sum GameReinforce-ment LearningDeception AttacksFault-tolerant Control

作者：

徐鑫峰、柳春、黄骁、孟亦真、王强

展开 >

作者单位：

上海大学机电工程与自动化学院,上海 200444

中国舰船研究设计中心,武汉 430064

上海航天电子技术研究所,上海 201109

上海市空间智能控制技术重点实验室,上海 201109

展开 >

关键词：

四旋翼无人机轨迹跟踪零和博弈强化学习欺骗攻击容错控制

出版年：

2024

DOI：

10.19942/j.issn.2096-5915.2024.06.56

无人系统技术

ISSN：

年,卷(期)：2024.7(6)