中国航空学报(英文版)2024,Vol.37Issue(7) :391-405.DOI:10.1016/j.cja.2024.03.025

Tube-based robust reinforcement learning for autonomous maneuver decision for UCAVs

Lixin WANG Sizhuang ZHENG Haiyin PIAO Changqian LU Ting YUE Hailiang LIU
中国航空学报(英文版)2024,Vol.37Issue(7) :391-405.DOI:10.1016/j.cja.2024.03.025

Tube-based robust reinforcement learning for autonomous maneuver decision for UCAVs

Lixin WANG 1Sizhuang ZHENG 1Haiyin PIAO 2Changqian LU 2Ting YUE 1Hailiang LIU1
扫码查看

作者信息

  • 1. School of Aeronautical Science and Engineering,Beihang University,Beijing 100191,China
  • 2. Shenyang Aircraft Design&Research Institute,Shenyang 110035,China
  • 折叠

Abstract

Reinforcement Learning(RL)algorithms enhance intelligence of air combat Autono-mous Maneuver Decision(AMD)policy,but they may underperform in target combat environ-ments with disturbances.To enhance the robustness of the AMD strategy learned by RL,this study proposes a Tube-based Robust RL(TRRL)method.First,this study introduces a tube to describe reachable trajectories under disturbances,formulates a method for calculating tubes based on sum-of-squares programming,and proposes the TRRL algorithm that enhances robustness by utilizing tube size as a quantitative indicator.Second,this study introduces offline techniques for regressing the tube size function and establishing a tube library before policy learning,aiming to eliminate complex online tube solving and reduce the computational burden during training.Fur-thermore,an analysis of the tube library demonstrates that the mitigated AMD strategy achieves greater robustness,as smaller tube sizes correspond to more cautious actions.This finding high-lights that TRRL enhances robustness by promoting a conservative policy.To effectively balance aggressiveness and robustness,the proposed TRRL algorithm introduces a"laziness factor"as a weight of robustness.Finally,combat simulations in an environment with disturbances confirm that the AMD policy learned by the TRRL algorithm exhibits superior air combat performance com-pared to selected robust RL baselines.

Key words

Air combat/Autonomous maneuver deci-sion/Robust reinforcement learn-ing/Tube-based algorithm/Combat simulation

引用本文复制引用

出版年

2024
中国航空学报(英文版)
中国航空学会

中国航空学报(英文版)

CSTPCDEI
影响因子:0.847
ISSN:1000-9361
参考文献量2
段落导航相关论文