基于PPO的水面无人艇集群任务规划方法

扫码查看

原文链接

万方数据
维普

中文摘要：针对多水面无人艇任务规划问题,提出一种基于近端策略优化的深度强化学习任务规划方法.以无人艇群打击敌港口内目标任务为研究对象,将任务决策问题抽象为合理有效的马尔科夫决策过程,建立近端策略优化(proximal policy optimization,PPO)算法智能规划模型,通过引入优势归一化、奖励缩放、策略熵等策略训练技巧,提高了PPO算法模型的学习性能和泛化能力.仿真实验表明,基于PPO算法的我方无人艇集群能有效协同打击敌方目标,证明提出的PPO算法模型在任务决策中的有效性.

外文标题：Proximal Policy Optimization-based Mission Planning Method for Surface Unmanned Boat Clusters

外文摘要：For the multi-surface unmanned boat(USV)mission planning problem,a deep reinforce-ment learning mission planning method based on proximal policy optimization is proposed.Taking the task of striking targets in enemy ports by a swarm of unmanned boats as the research object,the task decision-making problem is abstracted into a reasonable and effective Markovian decision-making pro-cess,a proximal policy optimization(PPO)algorithm intelligent planning model is established,and by introducing such policy training techniques as advantage normalization,reward scaling,and policy entropy,etc.the learning performance and generalization ability of the PPO algorithm model are improved.Finally,the simulation results show that our UAV swarm based on PPO algorithm can effec-tively cooperate against enemy targets,the effectiveness of the PPO algorithm model proposed in this paper in the mission decision making is proved.

外文关键词：

deep reinforcement learningMarkov decision-making processesunmanned boat swarmsmission planning

作者：

刘江山、彭鹏菲

展开 >

作者单位：

海军工程大学电子工程学院,武汉 430030

关键词：

深度强化学习马尔可夫决策过程无人艇群任务规划

出版年：

2024

DOI：

10.3969/j.issn.1002-0640.2024.11.026

火力与指挥控制

火力与指挥控制研究会,火力与指挥控制专业情报网

火力与指挥控制

CSTPCD北大核心

影响因子：0.312

ISSN：1002-0640

年,卷(期)：2024.49(11)