首页|基于近端策略动态优化的多智能体编队方法

基于近端策略动态优化的多智能体编队方法

Multi-agent Formation Method Based on Dynamic Optimization of Proximal Policies

扫码查看
无人机集群系统具有能力冗余、抗毁能力强、适应复杂场景等优势,能够实现高效的任务执行和信息获取.近年来,深度强化学习技术被引入无人机集群编队控制方法中,以解决集群维度爆炸和集群系统建模困难的弊端,但深度强化学习面临训练效率低等问题.本文提出了一种基于改进近端策略优化方法的集群编队方法,通过引入动态估计法作为评价机制,解决了传统近端策略优化方法收敛速度慢和忽视高价值动作问题,有效提升了数据利用率.仿真试验证明,该方法能够提高训练效率,解决样本复用问题,具有良好的决策性能.
Unmanned aerial vehicle(UAV)cluster systems have advantages in redundancy of capabilities,high destruction resistance,and adaptability to complex scenarios,allowing more efficient mission execution and information acquisition.In recent years,deep reinforcement learning techniques have been combined into UAV cluster formation control methods to treat the drawbacks of cluster dimension explosion and difficulty in modelling cluster systems.However,deep reinforcement learning has problems such as low training efficiency.In this paper,a cluster formation method using an improved proximal policy optimization method was proposed.It could solve the slow convergence problems and neglect of high-value actions of the traditional proximal policy optimization method by using the dynamic estimation method as the evaluation mechanism,and effectively improve the data utilization rate.Simulation results verified the improvement in the training efficiency and sample reuse problems,thus achieving the optimized performance.

unmanned aerial vehicle clusteringdeep reinforcement learningproximal policy optimizationinverse reinforcement learningcluster decision making

全家乐、马先龙、沈昱恒

展开 >

西北工业大学 航天学院,陕西 西安 710129

上海机电工程研究所,上海 201109

无人机集群 深度强化学习 近端策略优化 逆强化学习 集群决策

国家自然科学基金

61473226

2024

空天防御
上海机电工程研究所和上海交通大学出版社有限公司

空天防御

ISSN:2096-4641
年,卷(期):2024.7(2)
  • 22