火力与指挥控制2024,Vol.49Issue(11) :193-198.DOI:10.3969/j.issn.1002-0640.2024.11.026

基于PPO的水面无人艇集群任务规划方法

Proximal Policy Optimization-based Mission Planning Method for Surface Unmanned Boat Clusters

刘江山 彭鹏菲
火力与指挥控制2024,Vol.49Issue(11) :193-198.DOI:10.3969/j.issn.1002-0640.2024.11.026

基于PPO的水面无人艇集群任务规划方法

Proximal Policy Optimization-based Mission Planning Method for Surface Unmanned Boat Clusters

刘江山 1彭鹏菲1
扫码查看

作者信息

  • 1. 海军工程大学电子工程学院,武汉 430030
  • 折叠

摘要

针对多水面无人艇任务规划问题,提出一种基于近端策略优化的深度强化学习任务规划方法.以无人艇群打击敌港口内目标任务为研究对象,将任务决策问题抽象为合理有效的马尔科夫决策过程,建立近端策略优化(proximal policy optimization,PPO)算法智能规划模型,通过引入优势归一化、奖励缩放、策略熵等策略训练技巧,提高了PPO算法模型的学习性能和泛化能力.仿真实验表明,基于PPO算法的我方无人艇集群能有效协同打击敌方目标,证明提出的PPO算法模型在任务决策中的有效性.

Abstract

For the multi-surface unmanned boat(USV)mission planning problem,a deep reinforce-ment learning mission planning method based on proximal policy optimization is proposed.Taking the task of striking targets in enemy ports by a swarm of unmanned boats as the research object,the task decision-making problem is abstracted into a reasonable and effective Markovian decision-making pro-cess,a proximal policy optimization(PPO)algorithm intelligent planning model is established,and by introducing such policy training techniques as advantage normalization,reward scaling,and policy entropy,etc.the learning performance and generalization ability of the PPO algorithm model are improved.Finally,the simulation results show that our UAV swarm based on PPO algorithm can effec-tively cooperate against enemy targets,the effectiveness of the PPO algorithm model proposed in this paper in the mission decision making is proved.

关键词

深度强化学习/马尔可夫决策过程/无人艇群/任务规划

Key words

deep reinforcement learning/Markov decision-making processes/unmanned boat swarms/mission planning

引用本文复制引用

出版年

2024
火力与指挥控制
火力与指挥控制研究会,火力与指挥控制专业情报网

火力与指挥控制

CSTPCDCSCD北大核心
影响因子:0.312
ISSN:1002-0640
段落导航相关论文