首页|基于强化学习的飞行器自主规避决策方法

基于强化学习的飞行器自主规避决策方法

扫码查看
考虑飞行器在执行任务过程中存在诸多不可预知的威胁或障碍,为保障飞行器的安全性,本文进行飞行器面向威胁目标的自主规避决策方法研究.首先综合考虑飞行器与威胁目标行为之间的相互影响,提出了基于深度长短期记忆(LSTM)神经网络的轨迹预测算法,实现对威胁目标未来轨迹的预测;然后结合预测信息构建拦截场景下规避机动的马尔可夫决策过程,设计了基于改进双延迟深度确定性策略梯度(P-TD3)的飞行器规避决策方法,以最大化规避过程的总收益为优化目标,实现飞行器自主规避决策.最后通过在虚拟仿真交互平台的试验验证,本文的决策方法提升了网络的收敛速度,具有84%的规避成功率,提高了飞行器对潜在威胁的成功规避概率,有利于增强飞行器的自主性与安全性.
Autonomous Avoidance Decision Method for Aircraft Using Reinforcement Learning
There are many unpredictable threats or obstacles in the course of the mission of the aircraft.In order to solve the problem of autonomous avoidance decision of aircraft facing threat targets,firstly,a trajectory prediction algorithm based on deep Long Short-Term Memory(LSTM)neural network is proposed to predict the future trajectory of threat targets by considering the interaction between aircraft and threat targets.Secondly,the Markov decision process of evasive maneuver in the interception scenario was constructed combined with the prediction information.Then,the avoidance decision method based on progressed double delay depth deterministic strategy gradient(P-TD3)was proposed to maximize the benefits of the circumvention process to achieve intelligent autonomous avoidance decisions for the aircraft.Finally,the simulation experiments verify that the decision-making method improves the convergence speed of the network and has an 84%success rate of avoidance,which improves the probability of successful avoidance of potential threats and enhances the autonomy and safety of the aircraft.

hypersonic aircraftreinforcement learningdouble delay depth deterministic strategy gradientautonomous avoidancemaneuver decision

窦立谦、任梦圆、张秀云、宗群

展开 >

天津大学,天津 300072

高超声速飞行器 强化学习 双延迟深度确定性策略梯度 自主规避 机动决策

2024

航空科学技术
中国航空研究院

航空科学技术

影响因子:0.24
ISSN:1007-5453
年,卷(期):2024.35(6)