Research on Autonomous Decision-making in Air-combat Based on Improved Proximal Policy Optimization
To address the problems of high information redundancy and slow convergence speed of traditional reinforcement learning in air-combat autonomous decision-making applications,a proximal policy optimization air-combat autonomous decision-making method,based on dual observation and composite reward is proposed.A dual observation space,which contains interaction information as the main information and individual feature information as a supplement,was designed to reduce the influence of redundant battlefield information on the training efficiency of the decision model.A composite reward function combining result reward and process reward was designed to improve convergence speed.The generalized advantage estimator was applied in the proximal policy optimization strategy algorithm to improve the accuracy of advantage function estimation.Simulation results show that the method decision-making model can make precise autonomous decisions and complete air-combat tasks according to the battlefield situation in two types of experimental scenarios:against fixed-programmed and matrix gaming opponents.