基于近端策略动态优化的多智能体编队方法

Multi-agent Formation Method Based on Dynamic Optimization of Proximal Policies

扫码查看

原文链接

维普
万方数据

中文摘要：无人机集群系统具有能力冗余、抗毁能力强、适应复杂场景等优势,能够实现高效的任务执行和信息获取.近年来,深度强化学习技术被引入无人机集群编队控制方法中,以解决集群维度爆炸和集群系统建模困难的弊端,但深度强化学习面临训练效率低等问题.本文提出了一种基于改进近端策略优化方法的集群编队方法,通过引入动态估计法作为评价机制,解决了传统近端策略优化方法收敛速度慢和忽视高价值动作问题,有效提升了数据利用率.仿真试验证明,该方法能够提高训练效率,解决样本复用问题,具有良好的决策性能.

外文摘要：Unmanned aerial vehicle(UAV)cluster systems have advantages in redundancy of capabilities,high destruction resistance,and adaptability to complex scenarios,allowing more efficient mission execution and information acquisition.In recent years,deep reinforcement learning techniques have been combined into UAV cluster formation control methods to treat the drawbacks of cluster dimension explosion and difficulty in modelling cluster systems.However,deep reinforcement learning has problems such as low training efficiency.In this paper,a cluster formation method using an improved proximal policy optimization method was proposed.It could solve the slow convergence problems and neglect of high-value actions of the traditional proximal policy optimization method by using the dynamic estimation method as the evaluation mechanism,and effectively improve the data utilization rate.Simulation results verified the improvement in the training efficiency and sample reuse problems,thus achieving the optimized performance.

外文关键词：

unmanned aerial vehicle clusteringdeep reinforcement learningproximal policy optimizationinverse reinforcement learningcluster decision making

作者：

全家乐、马先龙、沈昱恒

展开 >

作者单位：

西北工业大学航天学院,陕西西安 710129

上海机电工程研究所,上海 201109

关键词：

无人机集群深度强化学习近端策略优化逆强化学习集群决策

基金：

国家自然科学基金

项目编号：

61473226

出版年：

2024

空天防御

上海机电工程研究所和上海交通大学出版社有限公司

空天防御

ISSN：2096-4641

年,卷(期)：2024.7(2)

参考文献量22