首页|多无人机系统在线强化学习最优安全跟踪控制

多无人机系统在线强化学习最优安全跟踪控制

扫码查看
在无人机(UAV)编队跟踪任务中,虚假数据注入(FDI)攻击者可向控制指令注入误导性数据,导致无人机无法形成指定的编队构型,故需设计安全编队跟踪控制器.为此,本文利用零和图博弈对攻防过程进行建模,其中FDI攻击者和安全控制器是博弈的参与者,攻击者的目标是最大化设定的成本函数,而安全控制器的目标与之相反,求解博弈并获得最优安全控制策略依赖于求取Hamilton-Jacobi-Isaacs(HJI)方程的解.而HJI方程是耦合偏微分方程,难以直接求解,因此结合经验回放机制引入了有限时间收敛的在线强化学习算法,设计了单评价神经网络近似值函数并获得了最优安全控制策略.最终利用仿真验证了算法的有效性.
Optimal Secure Tracking Control in Multi-UAVs Based on Online Reinforcement Learning
In Unmanned Aerial Vehicle(UAV)formation tracking missions,False Data Injection(FDI)attackers can inject misleading data into the control commands,resulting in the fact that UAVs can not form the specified formation configuration,so there is a need to design a secure formation tracking controller.The attack-defense process was modeled as a zero-sum graphical game,in which the FDI attacker and the secure controller were viewed as game players.The attacker aims to maximize the cost function yet the secure controller serves a contrary purpose.Solving the game and acquiring the optimal secure control policy rely on solving the Hamilton-Jacobi-Isaacs(HJI)equation.The HJI equation is a coupled partial differential equation,which is difficult to solve directly.Therefore,the finite-time convergent online reinforcement learning algorithm that combines the experience replay mechanism was introduced and the critic-only neural network was utilized to approximate the value function for obtaining the optimal secure control policy.A numerical simulation was given to show the effectiveness of the raised scheme.

FDI attackmulti-UAVsonline reinforcement learningoptimal controlzero-sum graphical game

弓镇宇、杨飞生

展开 >

西北工业大学,陕西 西安 710072

FDI攻击 多无人机 在线强化学习 优化控制 零和图博弈

国家自然科学基金航空科学基金陕西省重点研发计划项目重庆市自然科学基金广东省基础与应用基础研究基金

620732692020Z0340530022022GY-244CSTB2022NSCQ-MSX09632023A1515011220

2024

航空科学技术
中国航空研究院

航空科学技术

影响因子:0.24
ISSN:1007-5453
年,卷(期):2024.35(4)
  • 15