首页|Tube-based robust reinforcement learning for autonomous maneuver decision for UCAVs

Tube-based robust reinforcement learning for autonomous maneuver decision for UCAVs

扫码查看
Reinforcement Learning(RL)algorithms enhance intelligence of air combat Autono-mous Maneuver Decision(AMD)policy,but they may underperform in target combat environ-ments with disturbances.To enhance the robustness of the AMD strategy learned by RL,this study proposes a Tube-based Robust RL(TRRL)method.First,this study introduces a tube to describe reachable trajectories under disturbances,formulates a method for calculating tubes based on sum-of-squares programming,and proposes the TRRL algorithm that enhances robustness by utilizing tube size as a quantitative indicator.Second,this study introduces offline techniques for regressing the tube size function and establishing a tube library before policy learning,aiming to eliminate complex online tube solving and reduce the computational burden during training.Fur-thermore,an analysis of the tube library demonstrates that the mitigated AMD strategy achieves greater robustness,as smaller tube sizes correspond to more cautious actions.This finding high-lights that TRRL enhances robustness by promoting a conservative policy.To effectively balance aggressiveness and robustness,the proposed TRRL algorithm introduces a"laziness factor"as a weight of robustness.Finally,combat simulations in an environment with disturbances confirm that the AMD policy learned by the TRRL algorithm exhibits superior air combat performance com-pared to selected robust RL baselines.

Air combatAutonomous maneuver deci-sionRobust reinforcement learn-ingTube-based algorithmCombat simulation

Lixin WANG、Sizhuang ZHENG、Haiyin PIAO、Changqian LU、Ting YUE、Hailiang LIU

展开 >

School of Aeronautical Science and Engineering,Beihang University,Beijing 100191,China

Shenyang Aircraft Design&Research Institute,Shenyang 110035,China

2024

中国航空学报(英文版)
中国航空学会

中国航空学报(英文版)

CSTPCDEI
影响因子:0.847
ISSN:1000-9361
年,卷(期):2024.37(7)
  • 2