首页|基于多智能体深度强化学习的多船协同避碰策略

基于多智能体深度强化学习的多船协同避碰策略

扫码查看
为了提高多船会遇时智能避碰策略的协同性、安全性、实用性和节能性,在中心化训练去中心化执行框架下,结合优先经验回放机制提出一种多智能体Softmax深层双确定性策略梯度PER-MASD3算法,用于解决多船协同避碰问题,该算法不仅解决了双延迟确定策略梯度(TD3)算法存在的值估计偏差问题,还在模型训练过程中引入熵正则项,以促进探索和控制随机控制策略,采用自适应噪声对不同阶段的任务进行有效探索,进一步提升了算法的学习效果和稳定性.通过实验验证,所提算法在解决多船协同避碰问题上具有较好的决策效果、更快的收敛速度和更稳定的性能.
Multi-ship collaborative collision avoidance strategy based on multi-agent deep reinforcement learning
To improve the coordination,safety,practicability and energy saving of intelligent collision avoidance strategy for multi-ship encounters,a Prioritized Experience Replay-Multi Agent Softmax Deep Double Deterministic Policy Gradient(PER-MASD3)algorithm was proposed by combining with the Prioritized Experience Replay mechanism under the Central-ized Training with Decentralized Execution(CTDE)framework for solving the multi-ship cooperative collision avoidance problem.It not only solved the value estimation bias problem in Twin Delayed Deep Deterministic Policy Gradient(TD3)algorithm,but also introduced entropy regularization term in the process of model training to promote the exploration and control of stochastic control strategies.Adaptive noise was adopted to effectively explore tasks at different stages,further improving the learning effect and stability of the algorithm.The experiments showed that the proposed PER-MASD3 algo-rithm had better decision-making effect,faster convergence speed and more stable performance when it was used to solve the problem of multi-ship collaborative collision avoidance.

multi-agent deep reinforcement learningcoordinated collision avoidancecentralized training with decen-tralized executionprioritized experience replaymulti-agent Softmax deep double deterministic policy gradient

黄仁贤、罗亮

展开 >

武汉理工大学高性能舰船技术教育部重点实验室,湖北 武汉 430064

武汉理工大学船海与能源动力工程学院,湖北 武汉 430064

武汉理工大学三亚科教创新园,海南 三亚 572019

多智能体深度强化学习 协同避碰 中心化训练去中心化执行 优先经验回放 多智能体Softmax深层双确定性策略梯度

国家自然科学基金资助项目

52101368

2024

计算机集成制造系统
中国兵器工业集团第210研究所

计算机集成制造系统

CSTPCD北大核心
影响因子:1.092
ISSN:1006-5911
年,卷(期):2024.30(6)
  • 4