首页|Tactical reward shaping for large-scale combat by multi-agent reinforcement learning

Tactical reward shaping for large-scale combat by multi-agent reinforcement learning

扫码查看
Future unmanned battles desperately require intelli-gent combat policies,and multi-agent reinforcement learning offers a promising solution.However,due to the complexity of combat operations and large size of the combat group,this task suffers from credit assignment problem more than other rein-forcement learning tasks.This study uses reward shaping to relieve the credit assignment problem and improve policy train-ing for the new generation of large-scale unmanned combat operations.We first prove that multiple reward shaping func-tions would not change the Nash Equilibrium in stochastic games,providing theoretical support for their use.According to the characteristics of combat operations,we propose tactical reward shaping(TRS)that comprises maneuver shaping advice and threat assessment-based attack shaping advice.Then,we investigate the effects of different types and combinations of shaping advice on combat policies through experiments.The results show that TRS improves both the efficiency and attack accuracy of combat policies,with the combination of maneuver reward shaping advice and ally-focused attack shaping advice achieving the best performance compared with that of the base-line strategy.

deep reinforcement learningmulti-agent reinforce-ment learningmulti-agent combatunmanned battlereward shaping

DUO Nanxun、WANG Qinzhao、LYU Qiang、WANG Wei

展开 >

Department of Weapon and Control,Academy of Army Armored Forces,Beijing 100072,China

Beijing South Technology Co.,Ltd.,Beijing 100176,China

Beijing Special Vehicle Institute,Beijing 100072,China

2024

系统工程与电子技术(英文版)
中国航天科工防御技术研究院 中国宇航学会 中国系统工程学会 中国系统仿真学会

系统工程与电子技术(英文版)

CSTPCD
影响因子:0.64
ISSN:1004-4132
年,卷(期):2024.35(6)