Research on Autonomous Maneuver Decision Method for Unmanned Aerial Combat Based on an Improved PPO Algorithm
An application of deep reinforcement learning makes it possible for unmanned aerial vehicles to complete an autonomous maneuver decision-making.This paper proposes an unmanned combat aerial vehicle(UCAV)au-tonomous air combat maneuver decision-making method based on the reconstruction of situational assessment mod-els in combination with the proximal policy optimization(PPO)algorithm,providing effective strategy choices for 1 vs 1 within visual range(WVR)air combat.In response to the problem of low model fidelity,this paper,firstly,establishes a dynamic model of a six degree of freedom UCAV and defines the attack mode of WVR air combat.And then,in order to improve the adequacy of the situational assessment model in describing air combat,this pa-per reconstructs the angle,speed,distance,and altitude situational functions based on the division of air combat states,and proposes a new situational function that describes the potential for maneuver.In terms of reward func-tion design,in addition to rule-based sparse rewards,sub target rewards are established based on the transforma-tion of air combat states,and shaping reward functions are designed based on situational functions to enhance guid-ance capabilities.Finally,an expert system is designed to be a competitor to evaluate the work presented in this paper on the high fidelity air combat simulation platform(JSBSim).The simulation verification shows that being confronted with the fixed maneuvering opponents and expert system opponents,the intelligent agent enables to ef-fectively improve the convergence speed and winning rate of the algorithm by using the method proposed in this pa-per.
PPO algorithmmobile potentialsix degree of freedom aircraft modelsituation functionWVR air combatexpert system