首页|融合动态奖励策略的无人机编队路径规划方法

融合动态奖励策略的无人机编队路径规划方法

扫码查看
针对未知动态环境下无人机(unmanned aerial vehicle,UAV)编队路径规划问题,提出融合动态编队奖励 函数的多智能体双延迟深度确 定性策略梯度(multi-agent twin delayed deep deterministic strategy gradi-ent algorithm incorporating dynamic formation reward function,MATD3-IDFRF)算法的 UAV 编队智能决策方案。首先,针对无障碍物环境,拓展稀疏性奖励函数。然后,深入分析UAV编队路径规划中重点关注的动态编队问题,即UAV编队以稳定的结构飞行并根据周围环境微调队形,其本质为每两架UAV间距保持相对稳定,同时也依据外界环境而微调。为此,设计基于每两台UAV之间最佳间距和当前间距的奖励函数,在此基础上提出动态编队奖励函数,并结合多智能体双延迟深度确定性(multi-agent twin delayed deep deterministic,MATD3)算法提出MATD3-IDFRF算法。最后,设计对比实验,在复合障碍物环境中,所提动态编队奖励函数能将算法成功率提升6。8%,将收敛后的奖励平均值提升2。3%,将编队变形率降低97%。
UAV formation path planning approach incorporating dynamic reward strategy
For the unmanned aerial vehicle(UAV)formation path planning problem in unknown dynamic environment,an intelligent decision scheme for UAV formation based on multi-agent twin delayed deep deterministic strategy gradient algorithm incorporating dynamic formation reward function(MATD3-IDFRF)algorithm is proposed.Firstly,the sparsity reward function is extended for the obstacle-free environment.Then,the dynamic formation problem,which is the focus of attention in UAV formation path planning,is analyzed in depth.It is described as a UAV formation flying in a stable formation structure and a fine-tuning of the formation in time according to the surrounding environment.The essence of the analysis is that the spacing between each two UAVs remains relatively stable,while it is also fine-tuned by the external environment.A reward function based on the optimal distance and current distance between each pair of UAVs is designed,leading to the proposal of a dynamic formation reward function,and which is then combined with the multi-agent twin delayed deep deterministic(MATD3)algorithm to propose the MATD3-IDFRF algorithm.Finally,comparison experiments are designed,and the dynamic formation reward function presented in this paper can improve the algorithm success rate by 6.8%,while improving the converged reward average by 2.3%and reducing the formation deformation rate by 97%in the complex obstacle environment.

reinforcement learning(RL)reward functionunmanned aerial vehicle(UAV)dynamic formationpath planning

唐恒、孙伟、吕磊、贺若飞、吴建军、孙昌浩、孙田野

展开 >

西安电子科技大学空间科学与技术学院,陕西西安 710118

西北工业大学第365研究所,陕西西安 710072

西安爱生无人机技术有限公司,陕西西安 710065

中国空间技术研究院钱学森空间技术实验室,北京 100094

展开 >

强化学习 奖励函数 无人机 动态编队 路径规划

中国高校产学研创新基金西安市科技计划陕西省重点研发计划重点产业创新链项目国家自然科学基金

2021ZYA080042022JH-RGZN-00392022ZDLGY03-0162173330

2024

系统工程与电子技术
中国航天科工防御技术研究院 中国宇航学会 中国系统工程学会

系统工程与电子技术

CSTPCD北大核心
影响因子:0.847
ISSN:1001-506X
年,卷(期):2024.46(10)