首页|基于ATMADDPG算法的多水面无人航行器编队导航

基于ATMADDPG算法的多水面无人航行器编队导航

扫码查看
为提高多无人船编队系统的导航能力,提出了一种基于注意力机制的多智能体深度确定性策略梯度(ATMADDPG:Attention Mechanism based Multi-Agent Deep Deterministic Policy Gradient)算法。该算法在训练阶段,通过大量试验训练出最佳策略,并在实验阶段直接使用训练出的最佳策略得到最佳编队路径。仿真实验将4艘相同的"百川号"无人船作为实验对象。实验结果表明,基于ATMADDPG算法的队形保持策略能实现稳定的多无人船编队导航,并在一定程度上满足队形保持的要求。相较于多智能体深度确定性策略梯度(MADDPG:Multi-Agent Depth Deterministic Policy Gradient)算法,所提出的 ATMADDPG 算法在收敛速度、队形保持能力和对环境变化的适应性等方面表现出更优越的性能,综合导航效率可提高约80%,具有较大的应用潜力。
Formation Navigation of Multi-Unmanned Surface Vehicles Based on ATMADDPG Algorithm
The ATMADDPG(Attention Mechanism based Multi-Agent Deep Deterministic Policy Gradient)algorithm is proposed to improve the navigation ability of a multi-unmanned ship formation system.In the training phase,the algorithm trains the best strategy through a large number of experiments,and directly uses the trained best strategy to obtain the best formation path in the experimental phase.The simulation experiment uses four'Baichuan'unmanned ships as experimental objects.The experimental results show that the formation maintenance strategy based on the ATMADDPG algorithm can achieve stable navigation of multiple unmanned ship formations and meet the requirements of formation maintenance to some extent.Compared to the MADDPG(Multi-Agent Depth Deterministic Policy Gradient)algorithm,the developed ATMADDPG algorithm shows superior performance in terms of convergence speed,formation maintenance ability,and adaptability to environmental changes.The comprehensive navigation efficiency can be improved by about 80%,which has great application potential.

formation navigation of multi-unmanned surface vehiclesmulti-agent depth deterministic policy gradient(MADDPG)algorithmattention mechanismdeep reinforcement learning

王思琪、关巍、佟敏、赵盛烨

展开 >

大连海事大学航海学院,辽宁大连 116026

吉林大学通信设计院股份有限公司,长春 130012

辽宁一辉科技集团股份公司先进技术研究院,沈阳 110170

多无人船编队导航 MADDPG算法 注意力机制 深度强化学习

国家自然科学基金资助项目

52171342

2024

吉林大学学报(信息科学版)
吉林大学

吉林大学学报(信息科学版)

CSTPCD
影响因子:0.607
ISSN:1671-5896
年,卷(期):2024.42(4)