首页|一种结合MADDPG和对比学习的无人机追逃博弈方法

一种结合MADDPG和对比学习的无人机追逃博弈方法

扫码查看
针对复杂作战环境中无人机的追逃博弈问题,建立了其马尔科夫模型,采用零和博弈思想,设计了追逃双方的奖励函数.构建了集中训练-分布执行的多智能体深度确定性强化学习算法(MADDPG)的训练流程,求解得到追逃博弈的纳什均衡解.针对以追逃双方初始位置等高维向量构成的捕获域(逃逸域)难以解析表征的问题,在MADDPG博弈网络基础上,结合深度对比学习算法,通过构建和训练孪生神经网络,实现了对高维捕获域(逃逸域)的间接表征.仿真结果表明,MADDPG算法可以有效求出给定条件下的无人机追逃博弈的纳什均衡解,同时,对比学习算法结合收敛的MADDPG网络对高维的捕获域(逃逸域)表征的正确率达到95%.
An Algorithm for UAV Pursuit-Evasion Game Based on MADDPG and Contrastive Learning
To solve the pursuit and evasion game problem of unmanned aerial vehicles in complex combat environments,a Markov model is established,and reward functions for both pursuer and evader are designed under the zero-sum game concept.A centralized training with distributed execution framework is constructed for multi-agent deep deterministic policy gradient(MADDPG)to solve the Nash equilibrium of the pursuit-evasion game.To address the difficult issue of analytically representing the high-dimensional capture(escape)regions characterized by initial positions of the pursuers and evaders,a deep contrastive learning algorithm based on the MADDPG network is built to indirectly represent the high-dimensional capture(escape)regions through the construction and training of Siamese Network.Simulation results show that the Nash equilibrium solution of the pursuit-evasion game of UAVs under given conditions can be gotten by the MADDPG algorithm,and the accuracy rate of representing high-dimensional capture(escape)regions achieves 95%by the combination of contrastive learning algorithm and the converged MADDPG network.

Unmanned aerial vehicle(UAV)Pursuit-evasion gameMulti-agentReinforcement learningNash equilibriumDeep contrastive learning

王若冰、王晓芳

展开 >

北京理工大学宇航学院,北京 100081

无人机(UAV) 追逃博弈 多智能体 强化学习 纳什均衡 深度对比学习

国家自然科学基金

11502019

2024

宇航学报
中国宇航学会

宇航学报

CSTPCD北大核心
影响因子:0.887
ISSN:1000-1328
年,卷(期):2024.45(2)
  • 21