多无人机系统在线强化学习最优安全跟踪控制

Optimal Secure Tracking Control in Multi-UAVs Based on Online Reinforcement Learning

弓镇宇 ¹杨飞生¹

扫码查看

作者信息

1. 西北工业大学,陕西西安 710072
折叠

摘要

在无人机(UAV)编队跟踪任务中,虚假数据注入(FDI)攻击者可向控制指令注入误导性数据,导致无人机无法形成指定的编队构型,故需设计安全编队跟踪控制器.为此,本文利用零和图博弈对攻防过程进行建模,其中FDI攻击者和安全控制器是博弈的参与者,攻击者的目标是最大化设定的成本函数,而安全控制器的目标与之相反,求解博弈并获得最优安全控制策略依赖于求取Hamilton-Jacobi-Isaacs(HJI)方程的解.而HJI方程是耦合偏微分方程,难以直接求解,因此结合经验回放机制引入了有限时间收敛的在线强化学习算法,设计了单评价神经网络近似值函数并获得了最优安全控制策略.最终利用仿真验证了算法的有效性.

Abstract

In Unmanned Aerial Vehicle(UAV)formation tracking missions,False Data Injection(FDI)attackers can inject misleading data into the control commands,resulting in the fact that UAVs can not form the specified formation configuration,so there is a need to design a secure formation tracking controller.The attack-defense process was modeled as a zero-sum graphical game,in which the FDI attacker and the secure controller were viewed as game players.The attacker aims to maximize the cost function yet the secure controller serves a contrary purpose.Solving the game and acquiring the optimal secure control policy rely on solving the Hamilton-Jacobi-Isaacs(HJI)equation.The HJI equation is a coupled partial differential equation,which is difficult to solve directly.Therefore,the finite-time convergent online reinforcement learning algorithm that combines the experience replay mechanism was introduced and the critic-only neural network was utilized to approximate the value function for obtaining the optimal secure control policy.A numerical simulation was given to show the effectiveness of the raised scheme.

关键词

FDI攻击/多无人机/在线强化学习/优化控制/零和图博弈

Key words

FDI attack/multi-UAVs/online reinforcement learning/optimal control/zero-sum graphical game

引用本文复制引用

基金项目

国家自然科学基金(62073269)

航空科学基金(2020Z034053002)

陕西省重点研发计划项目(2022GY-244)

重庆市自然科学基金(CSTB2022NSCQ-MSX0963)

广东省基础与应用基础研究基金(2023A1515011220)

出版年

2024

航空科学技术

中国航空研究院

航空科学技术

影响因子：0.24

ISSN：1007-5453

参考文献量15

段落导航