系统工程与电子技术(英文版)2024,Vol.35Issue(6) :1516-1529.DOI:10.23919/JSEE.2024.000062

Tactical reward shaping for large-scale combat by multi-agent reinforcement learning

DUO Nanxun WANG Qinzhao LYU Qiang WANG Wei
系统工程与电子技术(英文版)2024,Vol.35Issue(6) :1516-1529.DOI:10.23919/JSEE.2024.000062

Tactical reward shaping for large-scale combat by multi-agent reinforcement learning

DUO Nanxun 1WANG Qinzhao 1LYU Qiang 2WANG Wei3
扫码查看

作者信息

  • 1. Department of Weapon and Control,Academy of Army Armored Forces,Beijing 100072,China
  • 2. Beijing South Technology Co.,Ltd.,Beijing 100176,China
  • 3. Beijing Special Vehicle Institute,Beijing 100072,China
  • 折叠

Abstract

Future unmanned battles desperately require intelli-gent combat policies,and multi-agent reinforcement learning offers a promising solution.However,due to the complexity of combat operations and large size of the combat group,this task suffers from credit assignment problem more than other rein-forcement learning tasks.This study uses reward shaping to relieve the credit assignment problem and improve policy train-ing for the new generation of large-scale unmanned combat operations.We first prove that multiple reward shaping func-tions would not change the Nash Equilibrium in stochastic games,providing theoretical support for their use.According to the characteristics of combat operations,we propose tactical reward shaping(TRS)that comprises maneuver shaping advice and threat assessment-based attack shaping advice.Then,we investigate the effects of different types and combinations of shaping advice on combat policies through experiments.The results show that TRS improves both the efficiency and attack accuracy of combat policies,with the combination of maneuver reward shaping advice and ally-focused attack shaping advice achieving the best performance compared with that of the base-line strategy.

Key words

deep reinforcement learning/multi-agent reinforce-ment learning/multi-agent combat/unmanned battle/reward shaping

引用本文复制引用

出版年

2024
系统工程与电子技术(英文版)
中国航天科工防御技术研究院 中国宇航学会 中国系统工程学会 中国系统仿真学会

系统工程与电子技术(英文版)

CSTPCD
影响因子:0.64
ISSN:1004-4132
段落导航相关论文