中国航空学报(英文版)2024,Vol.37Issue(6) :293-306.DOI:10.1016/j.cja.2024.03.030

MADRL-based UAV swarm non-cooperative game under incomplete information

Ershen WANG Fan LIU Chen HONG Jing GUO Lin ZHAO Jian XUE Ning HE
中国航空学报(英文版)2024,Vol.37Issue(6) :293-306.DOI:10.1016/j.cja.2024.03.030

MADRL-based UAV swarm non-cooperative game under incomplete information

Ershen WANG 1Fan LIU 1Chen HONG 2Jing GUO 1Lin ZHAO 3Jian XUE 3Ning HE4
扫码查看

作者信息

  • 1. School of Electronic and Information Engineering,Shenyang Aerospace University,Shenyang 110136,China
  • 2. College of Robotics,Beijing Union University,Beijing 100101,China
  • 3. School of Engineering Science,University of Chinese Academy of Sciences,Beijing 100049,China
  • 4. College of Smart City,Beijing Union University,Beijing 100101,China
  • 折叠

Abstract

Unmanned Aerial Vehicles(UAVs)play increasing important role in modern battlefield.In this paper,considering the incomplete observation information of individual UAV in complex combat environment,we put forward an UAV swarm non-cooperative game model based on Multi-Agent Deep Reinforcement Learning(MADRL),where the state space and action space are constructed to adapt the real features of UAV swarm air-to-air combat.The multi-agent particle environment is employed to generate an UAV combat scene with continuous observation space.Some recently popular MADRL methods are compared extensively in the UAV swarm non-cooperative game model,the results indicate that the performance of Multi-Agent Soft Actor-Critic(MASAC)is better than that of other MADRL methods by a large margin.UAV swarm employing MASAC can learn more effective policies,and obtain much higher hit rate and win rate.Simulations under different swarm sizes and UAV physical parameters are also performed,which implies that MASAC owns a well generalization effect.Furthermore,the practicability and conver-gence of MASAC are addressed by investigating the loss value of Q-value networks with respect to individual UAV,the results demonstrate that MASAC is of good practicability and the Nash equi-librium of the UAV swarm non-cooperative game under incomplete information can be reached.

Key words

UAV swarm/Reinforcement learning/Deep learning/Multi-agent/Non-cooperative game/Nash equilibrium

引用本文复制引用

基金项目

National Key R&D Program of China(2018AAA0100804)

National Natural Science Foundation of China(62173237)

Academic Research Projects of Beijing Union University,China(SK160202103)

Academic Research Projects of Beijing Union University,China(ZK50201911)

Academic Research Projects of Beijing Union University,China(ZK30202107)

Academic Research Projects of Beijing Union University,China(ZK30202108)

SongShan Laboratory Foundation,China(YYJC062022017)

Applied Basic Research Programs of Liaoning Province,China(2022020502-JH2/1013)

Applied Basic Research Programs of Liaoning Province,China(2022JH2/101300150)

Special Funds program of Civil Aircraft,China(01020220627066)

Special Funds program of Shenyang Science and Technology,China(22-322-3-34)

出版年

2024
中国航空学报(英文版)
中国航空学会

中国航空学报(英文版)

CSTPCDEI
影响因子:0.847
ISSN:1000-9361
参考文献量5
段落导航相关论文