控制理论与应用2024,Vol.41Issue(6) :990-998.DOI:10.7641/CTA.2023.20696

基于深度强化学习的舰船导弹目标分配方法

Missile-target assignment method of naval ship based on deep reinforcement learning

肖友刚 金升成 毛晓 伍国华 陆志沣
控制理论与应用2024,Vol.41Issue(6) :990-998.DOI:10.7641/CTA.2023.20696

基于深度强化学习的舰船导弹目标分配方法

Missile-target assignment method of naval ship based on deep reinforcement learning

肖友刚 1金升成 1毛晓 1伍国华 1陆志沣2
扫码查看

作者信息

  • 1. 中南大学交通运输工程学院,湖南长沙 410018
  • 2. 上海机电工程研究所,上海 201109
  • 折叠

摘要

针对对抗环境下的海上舰船防空反导导弹目标分配问题,本文提出了一种融合注意力机制的深度强化学习算法.首先,构建了舰船多类型导弹目标分配模型,并结合目标多波次拦截特点将问题建模为马尔可夫决策过程.接着,基于编码器-解码器框架搭建强化学习策略网络,融合多头注意力机制对目标进行编码,并在解码中结合整体目标和单个目标编码信息实现舰船可靠的导弹目标分配.最后,对导弹目标分配收益、分配时效以及策略网络训练过程进行了仿真实验.实验结果表明,本文方法能生成高收益的导弹目标分配方案,相较于对比算法的大规模决策计算速度提高10%~94%,同时其策略网络能够快速稳定收敛.

Abstract

To effectively solve the missile-target allocation problem of the naval ship in the case of confrontation,this study proposes a deep reinforcement learning algorithm combining attention mechanism.First,we construct a mathematical model for multi-type missiles of the naval ship and design the Markov decision-making process considering the situation of multi-times target interception.After that,the policy network is constructed based on the encoder-decoder architecture,in which targets are encoded combined with the multi-head attention mechanism and the reasonable missile-target allocation scheme is generated in the decoder according to integrated global and local embedding information.Finally,we conduct simulation experiments are carried out on the profit of missile-target allocation schemes,computation time,and the training process of the policy network.The experimental results show that our algorithm can engender missile-target allocation schemes with higher profit compared to baselines,the computation time in large-scale problems is reduced by 10%~94%,and it converges fast and stably.

关键词

防空反导/导弹目标分配/武器目标分配/深度强化学习

Key words

air defense and anti-missile/missile-target allocation/weapon-target allocation/deep reinforcement learn-ing

引用本文复制引用

出版年

2024
控制理论与应用
华南理工大学 中国科学院数学与系统科学研究院

控制理论与应用

CSTPCDCSCD北大核心
影响因子:1.076
ISSN:1000-8152
参考文献量6
段落导航相关论文