系统工程与电子技术2024,Vol.46Issue(10) :3506-3518.DOI:10.12305/j.issn.1001-506X.2024.10.27

融合动态奖励策略的无人机编队路径规划方法

UAV formation path planning approach incorporating dynamic reward strategy

唐恒 孙伟 吕磊 贺若飞 吴建军 孙昌浩 孙田野
系统工程与电子技术2024,Vol.46Issue(10) :3506-3518.DOI:10.12305/j.issn.1001-506X.2024.10.27

融合动态奖励策略的无人机编队路径规划方法

UAV formation path planning approach incorporating dynamic reward strategy

唐恒 1孙伟 1吕磊 1贺若飞 2吴建军 3孙昌浩 4孙田野1
扫码查看

作者信息

  • 1. 西安电子科技大学空间科学与技术学院,陕西西安 710118
  • 2. 西北工业大学第365研究所,陕西西安 710072
  • 3. 西安爱生无人机技术有限公司,陕西西安 710065
  • 4. 中国空间技术研究院钱学森空间技术实验室,北京 100094
  • 折叠

摘要

针对未知动态环境下无人机(unmanned aerial vehicle,UAV)编队路径规划问题,提出融合动态编队奖励 函数的多智能体双延迟深度确 定性策略梯度(multi-agent twin delayed deep deterministic strategy gradi-ent algorithm incorporating dynamic formation reward function,MATD3-IDFRF)算法的 UAV 编队智能决策方案.首先,针对无障碍物环境,拓展稀疏性奖励函数.然后,深入分析UAV编队路径规划中重点关注的动态编队问题,即UAV编队以稳定的结构飞行并根据周围环境微调队形,其本质为每两架UAV间距保持相对稳定,同时也依据外界环境而微调.为此,设计基于每两台UAV之间最佳间距和当前间距的奖励函数,在此基础上提出动态编队奖励函数,并结合多智能体双延迟深度确定性(multi-agent twin delayed deep deterministic,MATD3)算法提出MATD3-IDFRF算法.最后,设计对比实验,在复合障碍物环境中,所提动态编队奖励函数能将算法成功率提升6.8%,将收敛后的奖励平均值提升2.3%,将编队变形率降低97%.

Abstract

For the unmanned aerial vehicle(UAV)formation path planning problem in unknown dynamic environment,an intelligent decision scheme for UAV formation based on multi-agent twin delayed deep deterministic strategy gradient algorithm incorporating dynamic formation reward function(MATD3-IDFRF)algorithm is proposed.Firstly,the sparsity reward function is extended for the obstacle-free environment.Then,the dynamic formation problem,which is the focus of attention in UAV formation path planning,is analyzed in depth.It is described as a UAV formation flying in a stable formation structure and a fine-tuning of the formation in time according to the surrounding environment.The essence of the analysis is that the spacing between each two UAVs remains relatively stable,while it is also fine-tuned by the external environment.A reward function based on the optimal distance and current distance between each pair of UAVs is designed,leading to the proposal of a dynamic formation reward function,and which is then combined with the multi-agent twin delayed deep deterministic(MATD3)algorithm to propose the MATD3-IDFRF algorithm.Finally,comparison experiments are designed,and the dynamic formation reward function presented in this paper can improve the algorithm success rate by 6.8%,while improving the converged reward average by 2.3%and reducing the formation deformation rate by 97%in the complex obstacle environment.

关键词

强化学习/奖励函数/无人机/动态编队/路径规划

Key words

reinforcement learning(RL)/reward function/unmanned aerial vehicle(UAV)/dynamic formation/path planning

引用本文复制引用

基金项目

中国高校产学研创新基金(2021ZYA08004)

西安市科技计划(2022JH-RGZN-0039)

陕西省重点研发计划重点产业创新链项目(2022ZDLGY03-01)

国家自然科学基金(62173330)

出版年

2024
系统工程与电子技术
中国航天科工防御技术研究院 中国宇航学会 中国系统工程学会

系统工程与电子技术

CSTPCD北大核心
影响因子:0.847
ISSN:1001-506X
段落导航相关论文