基于多智能体深度强化学习的多船协同避碰策略

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：为了提高多船会遇时智能避碰策略的协同性、安全性、实用性和节能性,在中心化训练去中心化执行框架下,结合优先经验回放机制提出一种多智能体Softmax深层双确定性策略梯度PER-MASD3算法,用于解决多船协同避碰问题,该算法不仅解决了双延迟确定策略梯度(TD3)算法存在的值估计偏差问题,还在模型训练过程中引入熵正则项,以促进探索和控制随机控制策略,采用自适应噪声对不同阶段的任务进行有效探索,进一步提升了算法的学习效果和稳定性.通过实验验证,所提算法在解决多船协同避碰问题上具有较好的决策效果、更快的收敛速度和更稳定的性能.

外文标题：Multi-ship collaborative collision avoidance strategy based on multi-agent deep reinforcement learning

外文摘要：To improve the coordination,safety,practicability and energy saving of intelligent collision avoidance strategy for multi-ship encounters,a Prioritized Experience Replay-Multi Agent Softmax Deep Double Deterministic Policy Gradient(PER-MASD3)algorithm was proposed by combining with the Prioritized Experience Replay mechanism under the Central-ized Training with Decentralized Execution(CTDE)framework for solving the multi-ship cooperative collision avoidance problem.It not only solved the value estimation bias problem in Twin Delayed Deep Deterministic Policy Gradient(TD3)algorithm,but also introduced entropy regularization term in the process of model training to promote the exploration and control of stochastic control strategies.Adaptive noise was adopted to effectively explore tasks at different stages,further improving the learning effect and stability of the algorithm.The experiments showed that the proposed PER-MASD3 algo-rithm had better decision-making effect,faster convergence speed and more stable performance when it was used to solve the problem of multi-ship collaborative collision avoidance.

外文关键词：

multi-agent deep reinforcement learningcoordinated collision avoidancecentralized training with decen-tralized executionprioritized experience replaymulti-agent Softmax deep double deterministic policy gradient

作者：

黄仁贤、罗亮

展开 >

作者单位：

武汉理工大学高性能舰船技术教育部重点实验室,湖北武汉 430064

武汉理工大学船海与能源动力工程学院,湖北武汉 430064

武汉理工大学三亚科教创新园,海南三亚 572019

关键词：

多智能体深度强化学习协同避碰中心化训练去中心化执行优先经验回放多智能体Softmax深层双确定性策略梯度

基金：

国家自然科学基金资助项目

项目编号：

52101368

出版年：

2024

DOI：

10.13196/j.cims.2023.0382

计算机集成制造系统

中国兵器工业集团第210研究所

计算机集成制造系统

CSTPCD北大核心

影响因子：1.092

ISSN：1006-5911

年,卷(期)：2024.30(6)

参考文献量4