系统工程与电子技术(英文版)2024,Vol.35Issue(3) :644-665.DOI:10.23919/JSEE.2024.000022

UAV maneuvering decision-making algorithm based on deep reinforcement learning under the guidance of expert experience

ZHAN Guang ZHANG Kun LI Ke PIAO Haiyin
系统工程与电子技术(英文版)2024,Vol.35Issue(3) :644-665.DOI:10.23919/JSEE.2024.000022

UAV maneuvering decision-making algorithm based on deep reinforcement learning under the guidance of expert experience

ZHAN Guang 1ZHANG Kun 2LI Ke 1PIAO Haiyin1
扫码查看

作者信息

  • 1. School of Electronics and Information,Northwestern Polytechnical University,Xi'an 710072,China
  • 2. School of Electronics and Information,Northwestern Polytechnical University,Xi'an 710072,China;Science and Technology on Electro-Optic Control Laboratory,Luoyang 471009,China
  • 折叠

Abstract

Autonomous umanned aerial vehicle(UAV)manipula-tion is necessary for the defense department to execute tactical missions given by commanders in the future unmanned battle-field.A large amount of research has been devoted to improving the autonomous decision-making ability of UAV in an interactive environment,where finding the optimal maneuvering decision-making policy became one of the key issues for enabling the intelligence of UAV.In this paper,we propose a maneuvering decision-making algorithm for autonomous air-delivery based on deep reinforcement learning under the guidance of expert expe-rience.Specifically,we refine the guidance towards area and guidance towards specific point tasks for the air-delivery pro-cess based on the traditional air-to-surface fire control methods.Moreover,we construct the UAV maneuvering decision-making model based on Markov decision processes(MDPs).Specifi-cally,we present a reward shaping method for the guidance towards area and guidance towards specific point tasks using potential-based function and expert-guided advice.The pro-posed algorithm could accelerate the convergence of the maneuvering decision-making policy and increase the stability of the policy in terms of the output during the later stage of training process.The effectiveness of the proposed maneuvering deci-sion-making policy is illustrated by the curves of training para-meters and extensive experimental results for testing the trained policy.

Key words

unmanned aerial vehicle(UAV)/maneuvering deci-sion-making/autonomous air-delivery/deep reinforcement learn-ing/reward shaping/expert experience

引用本文复制引用

基金项目

Key Research and Development Program of Shaanxi(2022GXLH-02-09)

航空科学基金(20200051053001)

陕西省自然科学基金(2020JM-147)

出版年

2024
系统工程与电子技术(英文版)
中国航天科工防御技术研究院 中国宇航学会 中国系统工程学会 中国系统仿真学会

系统工程与电子技术(英文版)

CSTPCD
影响因子:0.64
ISSN:1004-4132
参考文献量1
段落导航相关论文