首页|基于改进近端策略优化的空战自主决策研究

基于改进近端策略优化的空战自主决策研究

扫码查看
针对传统强化学习在空战自主决策应用中信息冗余度高、收敛速度慢等问题,提出一种基于双重观测与复合奖励的近端策略优化空战自主决策算法.设计了以交互信息为主、个体特征信息为辅的双重观测信息,降低战场信息高度冗余对训练效率的影响;设计了结果奖励和过程奖励相结合的复合奖励函数,提高了训练过程收敛速度;采用广义优势函数估计,改进了近端策略优化算法,提高优势函数估计的准确性.仿真结果表明:在对战固定程控对手和矩阵博弈对手实验场景中,该算法决策模型均可根据战场态势准确进行自主决策,完成空战任务.
Research on Autonomous Decision-making in Air-combat Based on Improved Proximal Policy Optimization
To address the problems of high information redundancy and slow convergence speed of traditional reinforcement learning in air-combat autonomous decision-making applications,a proximal policy optimization air-combat autonomous decision-making method,based on dual observation and composite reward is proposed.A dual observation space,which contains interaction information as the main information and individual feature information as a supplement,was designed to reduce the influence of redundant battlefield information on the training efficiency of the decision model.A composite reward function combining result reward and process reward was designed to improve convergence speed.The generalized advantage estimator was applied in the proximal policy optimization strategy algorithm to improve the accuracy of advantage function estimation.Simulation results show that the method decision-making model can make precise autonomous decisions and complete air-combat tasks according to the battlefield situation in two types of experimental scenarios:against fixed-programmed and matrix gaming opponents.

RLair-combat autonomous decision-makingdual observationcomposite rewardgeneralized advantage estimator

钱殿伟、齐红敏、刘振、周志明、易建强

展开 >

华北电力大学控制与计算机工程学院,北京 102206

中国科学院自动化研究所,北京 100190

强化学习 空战自主决策 双重观测 复合奖励 广义优势函数估计

2024

系统仿真学报
北京仿真中心 中国系统仿真学会

系统仿真学报

CSTPCD北大核心
影响因子:0.551
ISSN:1004-731X
年,卷(期):2024.36(9)