首页|基于相对熵逆强化学习的飞行冲突解脱方法

基于相对熵逆强化学习的飞行冲突解脱方法

扫码查看
针对航路上的飞行冲突解脱问题,提出了基于相对熵逆强化学习的飞行冲突解脱方法。首先基于相对熵的逆强化学习算法从历史飞行轨迹数据中学习隐含的管制员先验知识,并以奖励函数的形式进行量化表达。然后,将奖励函数引入基于深度强化学习的冲突解脱模型,以指引训练模型不断向与管制员解脱方案相似的方向更新。试验结果表明,解脱模型能够学习管制先验知识,且在测试集中冲突解脱率超过73%。研究对于减少管制员工作负荷和提升空中交通管制安全性有借鉴价值。
Flight conflict resolution method based on relative entropy inverse reinforcement learning
The primary objective of air traffic management is to ensure the safety of aircraft flights.Flight conflicts can lead to hazardous approaches or even collisions,resulting in severe consequences.Therefore,studying auxiliary tools to assist controllers in resolving flight conflicts becomes essential.This article aims to enhance the personalization level of regulatory decision-making tools and improve controllers'acceptance of conflict resolution solutions provided by these tools.Firstly,this article adopts an inverse reinforcement learning method based on relative entropy to extract implicit controller instruction strategies from aircraft flight trajectory data and represent them as reward functions.The flight conflict resolution problem is then modeled using the Markov decision process,and the deep reinforcement learning method(DQN algorithm)is employed to train the model guided by the aforementioned reward function.The objective is to enhance the success rate of the resolution models and the degree of strategy personalization.Additionally,the article introduces analysis indicators from two perspectives:safety and applicability.Finally,a simulation system based on the Base of Aircraft Data(BADA)database is utilized to generate 5 000 flight conflict scenarios.Out of these,4 000 scenarios are used for model training,and the remaining 1 000 are employed to verify the effectiveness of the proposed method.Experimental results demonstrate that,under the guidance of a reward function incorporating controller strategies,the resolution model consistently improves the success rate of flight conflict scenarios and the similarity to controller strategies.During the testing phase,the successful resolution rate exceeds 70%.This result validates that the inverse reinforcement learning method based on relative entropy effectively learns the empirical knowledge of controllers,thereby enhancing the efficiency and personalization level of the resolution models.These methods present a novel approach to studying and improving the level of personification in control schemes,which has practical significance in enhancing the efficiency of air traffic operations and ensuring airspace safety.

safety engineeringair traffic controlflight conflict resolutionreverse reinforcement learningdeep reinforcement learning

隋东、董金涛

展开 >

南京航空航天大学民航学院,南京 211106

安全工程 空中交通管制 飞行冲突解脱 逆强化学习 深度强化学习

中国民用航空局资助项目南京航空航天大学科研与实践创新计划项目

[2022]125号xcxjh20220710

2024

安全与环境学报
北京理工大学 中国环境科学学会 中国职业安全健康协会

安全与环境学报

CSTPCD北大核心
影响因子:0.943
ISSN:1009-6094
年,卷(期):2024.24(3)
  • 21