无人系统技术2024,Vol.7Issue(6) :19-29.DOI:10.19942/j.issn.2096-5915.2024.06.56

基于零和博弈的四旋翼无人机强化学习容错跟踪控制

Zero-sum Game-based Fault-tolerant Tracking Control of Quadrotor Unmanned Aerial Vehicle Using Reinforcement Learning

徐鑫峰 柳春 黄骁 孟亦真 王强
无人系统技术2024,Vol.7Issue(6) :19-29.DOI:10.19942/j.issn.2096-5915.2024.06.56

基于零和博弈的四旋翼无人机强化学习容错跟踪控制

Zero-sum Game-based Fault-tolerant Tracking Control of Quadrotor Unmanned Aerial Vehicle Using Reinforcement Learning

徐鑫峰 1柳春 1黄骁 2孟亦真 3王强1
扫码查看

作者信息

  • 1. 上海大学机电工程与自动化学院,上海 200444
  • 2. 中国舰船研究设计中心,武汉 430064
  • 3. 上海航天电子技术研究所,上海 201109;上海市空间智能控制技术重点实验室,上海 201109
  • 折叠

摘要

针对在欺骗攻击下具有未知动力学的四旋翼无人机轨迹跟踪问题,开展了一种基于零和博弈框架的强化学习容错控制策略研究.首先,依据四旋翼无人机的系统模型和中间控制律,建立了系统的误差动力学.随后,在零和博弈框架下,设计了控制输入与欺骗攻击的对抗策略,通过最小化代价函数,确保四旋翼无人机在面对欺骗攻击时能够实现有效的容错控制.接着,开发了基于强化学习的演员-评论家神经网络算法,动态调整策略以达到零和博弈的纳什均衡.通过稳定性分析,证明了在该控制算法下,闭环系统中所有信号均保持有界.最后,仿真实验验证了所提基于零和博弈的强化学习容错轨迹跟踪控制算法的有效性和适应性,且方案使容错性能提升了10%.

Abstract

This paper investigates the trajectory tracking problem of quadrotor unmanned aerial vehicle(UAV)with unknown dynamics under deception attacks by proposing a fault-tolerant control strategy based on a zero-sum game framework and reinforcement learning.Firstly,the system's error dynamics are established based on the quadrotor UAV model and the intermediary control law.Then,within the zero-sum game framework,adversarial strategies for both control input and deception attacks are designed,with the cost function minimized to ensure effective fault-tolerant control in the presence of deception attacks.Subsequently,an actor-critic neural network algorithm based on reinforcement learning is developed to dynamically update the strategies,achieving the Nash equilibrium of the zero-sum game.Stability analysis demonstrates that all signals in the closed-loop system remain bounded under the proposed control algorithm.Finally,simulation results validate the effectiveness and adaptability of the proposed fault-tolerant trajectory tracking control algorithm based on the zero-sum game and reinforcement learning,which improves fault tolerance performance by 10%.

关键词

四旋翼无人机/轨迹跟踪/零和博弈/强化学习/欺骗攻击/容错控制

Key words

Quadrotor Unmanned Aerial Vehicle/Trajectory Tracking/Zero-sum Game/Reinforce-ment Learning/Deception Attacks/Fault-tolerant Control

引用本文复制引用

出版年

2024
无人系统技术

无人系统技术

ISSN:
段落导航相关论文