Tactical reward shaping for large-scale combat by multi-agent reinforcement learning

DUO Nanxun ¹WANG Qinzhao ¹LYU Qiang ²WANG Wei³

扫码查看

作者信息

1. Department of Weapon and Control,Academy of Army Armored Forces,Beijing 100072,China
2. Beijing South Technology Co.,Ltd.,Beijing 100176,China
3. Beijing Special Vehicle Institute,Beijing 100072,China
折叠

Abstract

Future unmanned battles desperately require intelli-gent combat policies,and multi-agent reinforcement learning offers a promising solution.However,due to the complexity of combat operations and large size of the combat group,this task suffers from credit assignment problem more than other rein-forcement learning tasks.This study uses reward shaping to relieve the credit assignment problem and improve policy train-ing for the new generation of large-scale unmanned combat operations.We first prove that multiple reward shaping func-tions would not change the Nash Equilibrium in stochastic games,providing theoretical support for their use.According to the characteristics of combat operations,we propose tactical reward shaping(TRS)that comprises maneuver shaping advice and threat assessment-based attack shaping advice.Then,we investigate the effects of different types and combinations of shaping advice on combat policies through experiments.The results show that TRS improves both the efficiency and attack accuracy of combat policies,with the combination of maneuver reward shaping advice and ally-focused attack shaping advice achieving the best performance compared with that of the base-line strategy.

Key words

deep reinforcement learning/multi-agent reinforce-ment learning/multi-agent combat/unmanned battle/reward shaping

引用本文复制引用

出版年

2024

系统工程与电子技术(英文版)

中国航天科工防御技术研究院中国宇航学会中国系统工程学会中国系统仿真学会

系统工程与电子技术(英文版)

CSTPCD

影响因子：0.64

ISSN：1004-4132

段落导航