融合动态奖励策略的无人机编队路径规划方法
UAV formation path planning approach incorporating dynamic reward strategy
唐恒 1孙伟 1吕磊 1贺若飞 2吴建军 3孙昌浩 4孙田野1
作者信息
- 1. 西安电子科技大学空间科学与技术学院,陕西西安 710118
- 2. 西北工业大学第365研究所,陕西西安 710072
- 3. 西安爱生无人机技术有限公司,陕西西安 710065
- 4. 中国空间技术研究院钱学森空间技术实验室,北京 100094
- 折叠
摘要
针对未知动态环境下无人机(unmanned aerial vehicle,UAV)编队路径规划问题,提出融合动态编队奖励 函数的多智能体双延迟深度确 定性策略梯度(multi-agent twin delayed deep deterministic strategy gradi-ent algorithm incorporating dynamic formation reward function,MATD3-IDFRF)算法的 UAV 编队智能决策方案.首先,针对无障碍物环境,拓展稀疏性奖励函数.然后,深入分析UAV编队路径规划中重点关注的动态编队问题,即UAV编队以稳定的结构飞行并根据周围环境微调队形,其本质为每两架UAV间距保持相对稳定,同时也依据外界环境而微调.为此,设计基于每两台UAV之间最佳间距和当前间距的奖励函数,在此基础上提出动态编队奖励函数,并结合多智能体双延迟深度确定性(multi-agent twin delayed deep deterministic,MATD3)算法提出MATD3-IDFRF算法.最后,设计对比实验,在复合障碍物环境中,所提动态编队奖励函数能将算法成功率提升6.8%,将收敛后的奖励平均值提升2.3%,将编队变形率降低97%.
Abstract
For the unmanned aerial vehicle(UAV)formation path planning problem in unknown dynamic environment,an intelligent decision scheme for UAV formation based on multi-agent twin delayed deep deterministic strategy gradient algorithm incorporating dynamic formation reward function(MATD3-IDFRF)algorithm is proposed.Firstly,the sparsity reward function is extended for the obstacle-free environment.Then,the dynamic formation problem,which is the focus of attention in UAV formation path planning,is analyzed in depth.It is described as a UAV formation flying in a stable formation structure and a fine-tuning of the formation in time according to the surrounding environment.The essence of the analysis is that the spacing between each two UAVs remains relatively stable,while it is also fine-tuned by the external environment.A reward function based on the optimal distance and current distance between each pair of UAVs is designed,leading to the proposal of a dynamic formation reward function,and which is then combined with the multi-agent twin delayed deep deterministic(MATD3)algorithm to propose the MATD3-IDFRF algorithm.Finally,comparison experiments are designed,and the dynamic formation reward function presented in this paper can improve the algorithm success rate by 6.8%,while improving the converged reward average by 2.3%and reducing the formation deformation rate by 97%in the complex obstacle environment.
关键词
强化学习/奖励函数/无人机/动态编队/路径规划Key words
reinforcement learning(RL)/reward function/unmanned aerial vehicle(UAV)/dynamic formation/path planning引用本文复制引用
基金项目
中国高校产学研创新基金(2021ZYA08004)
西安市科技计划(2022JH-RGZN-0039)
陕西省重点研发计划重点产业创新链项目(2022ZDLGY03-01)
国家自然科学基金(62173330)
出版年
2024