融合动态奖励策略的无人机编队路径规划方法

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：针对未知动态环境下无人机(unmanned aerial vehicle,UAV)编队路径规划问题,提出融合动态编队奖励函数的多智能体双延迟深度确定性策略梯度(multi-agent twin delayed deep deterministic strategy gradi-ent algorithm incorporating dynamic formation reward function,MATD3-IDFRF)算法的 UAV 编队智能决策方案.首先,针对无障碍物环境,拓展稀疏性奖励函数.然后,深入分析UAV编队路径规划中重点关注的动态编队问题,即UAV编队以稳定的结构飞行并根据周围环境微调队形,其本质为每两架UAV间距保持相对稳定,同时也依据外界环境而微调.为此,设计基于每两台UAV之间最佳间距和当前间距的奖励函数,在此基础上提出动态编队奖励函数,并结合多智能体双延迟深度确定性(multi-agent twin delayed deep deterministic,MATD3)算法提出MATD3-IDFRF算法.最后,设计对比实验,在复合障碍物环境中,所提动态编队奖励函数能将算法成功率提升6.8％,将收敛后的奖励平均值提升2.3％,将编队变形率降低97％.

外文标题：UAV formation path planning approach incorporating dynamic reward strategy

外文摘要：For the unmanned aerial vehicle(UAV)formation path planning problem in unknown dynamic environment,an intelligent decision scheme for UAV formation based on multi-agent twin delayed deep deterministic strategy gradient algorithm incorporating dynamic formation reward function(MATD3-IDFRF)algorithm is proposed.Firstly,the sparsity reward function is extended for the obstacle-free environment.Then,the dynamic formation problem,which is the focus of attention in UAV formation path planning,is analyzed in depth.It is described as a UAV formation flying in a stable formation structure and a fine-tuning of the formation in time according to the surrounding environment.The essence of the analysis is that the spacing between each two UAVs remains relatively stable,while it is also fine-tuned by the external environment.A reward function based on the optimal distance and current distance between each pair of UAVs is designed,leading to the proposal of a dynamic formation reward function,and which is then combined with the multi-agent twin delayed deep deterministic(MATD3)algorithm to propose the MATD3-IDFRF algorithm.Finally,comparison experiments are designed,and the dynamic formation reward function presented in this paper can improve the algorithm success rate by 6.8％,while improving the converged reward average by 2.3％and reducing the formation deformation rate by 97％in the complex obstacle environment.

外文关键词：

reinforcement learning(RL)reward functionunmanned aerial vehicle(UAV)dynamic formationpath planning

作者：

唐恒、孙伟、吕磊、贺若飞、吴建军、孙昌浩、孙田野

展开 >

作者单位：

西安电子科技大学空间科学与技术学院,陕西西安 710118

西北工业大学第365研究所,陕西西安 710072

西安爱生无人机技术有限公司,陕西西安 710065

中国空间技术研究院钱学森空间技术实验室,北京 100094

展开 >

关键词：

强化学习奖励函数无人机动态编队路径规划

基金：

中国高校产学研创新基金西安市科技计划陕西省重点研发计划重点产业创新链项目国家自然科学基金

项目编号：

2021ZYA080042022JH-RGZN-00392022ZDLGY03-0162173330

出版年：

2024

DOI：

10.12305/j.issn.1001-506X.2024.10.27

系统工程与电子技术

中国航天科工防御技术研究院中国宇航学会中国系统工程学会

系统工程与电子技术

CSTPCD北大核心

影响因子：0.847

ISSN：1001-506X

年,卷(期)：2024.46(10)