首页|基于一种改进PPO算法的无人机空战自主机动决策方法研究

基于一种改进PPO算法的无人机空战自主机动决策方法研究

扫码查看
深度强化学习的应用为无人机自主机动决策提供了新的可能。提出一种基于态势评估模型重构与近端策略优化(PPO)算法相结合的无人机自主空战机动决策方法,为一对一近距空战提供了有效策略选择。首先,建立高保真六自由度无人机模型与近距空战攻击模型;其次,基于空战状态划分重构角度、速度、距离和高度态势函数,提出一种描述机动潜力的新型态势评估指标;之后,基于态势函数设计塑形奖励,并与基于规则的稀疏奖励、基于状态转换的子目标奖励共同构成算法奖励函数,增强了强化学习算法的引导能力;最后,设计专家系统作为对手,在高保真空战仿真平台(JSBSim)中对本文工作进行了评估。仿真验证,应用本文方法的智能体在对抗固定机动对手与专家系统对手时算法收敛速度与胜率都得到了有效提升。
Research on Autonomous Maneuver Decision Method for Unmanned Aerial Combat Based on an Improved PPO Algorithm
An application of deep reinforcement learning makes it possible for unmanned aerial vehicles to complete an autonomous maneuver decision-making.This paper proposes an unmanned combat aerial vehicle(UCAV)au-tonomous air combat maneuver decision-making method based on the reconstruction of situational assessment mod-els in combination with the proximal policy optimization(PPO)algorithm,providing effective strategy choices for 1 vs 1 within visual range(WVR)air combat.In response to the problem of low model fidelity,this paper,firstly,establishes a dynamic model of a six degree of freedom UCAV and defines the attack mode of WVR air combat.And then,in order to improve the adequacy of the situational assessment model in describing air combat,this pa-per reconstructs the angle,speed,distance,and altitude situational functions based on the division of air combat states,and proposes a new situational function that describes the potential for maneuver.In terms of reward func-tion design,in addition to rule-based sparse rewards,sub target rewards are established based on the transforma-tion of air combat states,and shaping reward functions are designed based on situational functions to enhance guid-ance capabilities.Finally,an expert system is designed to be a competitor to evaluate the work presented in this paper on the high fidelity air combat simulation platform(JSBSim).The simulation verification shows that being confronted with the fixed maneuvering opponents and expert system opponents,the intelligent agent enables to ef-fectively improve the convergence speed and winning rate of the algorithm by using the method proposed in this pa-per.

PPO algorithmmobile potentialsix degree of freedom aircraft modelsituation functionWVR air combatexpert system

张欣、董文瀚、尹晖、贺磊、张聘、李敦旺

展开 >

空军工程大学航空工程学院,西安,710038

空军工程大学教研保障中心,西安,710051

PPO算法 机动潜力 六自由度飞机模型 态势函数 近距空战 专家系统

2024

空军工程大学学报
空军工程大学科研部

空军工程大学学报

CSTPCD北大核心
影响因子:0.55
ISSN:2097-1915
年,卷(期):2024.25(6)