一种结合MADDPG和对比学习的无人机追逃博弈方法

An Algorithm for UAV Pursuit-Evasion Game Based on MADDPG and Contrastive Learning

扫码查看

原文链接

维普
万方数据

中文摘要：针对复杂作战环境中无人机的追逃博弈问题,建立了其马尔科夫模型,采用零和博弈思想,设计了追逃双方的奖励函数.构建了集中训练-分布执行的多智能体深度确定性强化学习算法(MADDPG)的训练流程,求解得到追逃博弈的纳什均衡解.针对以追逃双方初始位置等高维向量构成的捕获域(逃逸域)难以解析表征的问题,在MADDPG博弈网络基础上,结合深度对比学习算法,通过构建和训练孪生神经网络,实现了对高维捕获域(逃逸域)的间接表征.仿真结果表明,MADDPG算法可以有效求出给定条件下的无人机追逃博弈的纳什均衡解,同时,对比学习算法结合收敛的MADDPG网络对高维的捕获域(逃逸域)表征的正确率达到95%.

外文摘要：To solve the pursuit and evasion game problem of unmanned aerial vehicles in complex combat environments,a Markov model is established,and reward functions for both pursuer and evader are designed under the zero-sum game concept.A centralized training with distributed execution framework is constructed for multi-agent deep deterministic policy gradient(MADDPG)to solve the Nash equilibrium of the pursuit-evasion game.To address the difficult issue of analytically representing the high-dimensional capture(escape)regions characterized by initial positions of the pursuers and evaders,a deep contrastive learning algorithm based on the MADDPG network is built to indirectly represent the high-dimensional capture(escape)regions through the construction and training of Siamese Network.Simulation results show that the Nash equilibrium solution of the pursuit-evasion game of UAVs under given conditions can be gotten by the MADDPG algorithm,and the accuracy rate of representing high-dimensional capture(escape)regions achieves 95%by the combination of contrastive learning algorithm and the converged MADDPG network.

外文关键词：

Unmanned aerial vehicle(UAV)Pursuit-evasion gameMulti-agentReinforcement learningNash equilibriumDeep contrastive learning

作者：

王若冰、王晓芳

展开 >

作者单位：

北京理工大学宇航学院,北京 100081

关键词：

无人机(UAV) 追逃博弈多智能体强化学习纳什均衡深度对比学习

基金：

国家自然科学基金

项目编号：

11502019

出版年：

2024

DOI：

10.3873/j.issn.1000-1328.2024.02.011

宇航学报

中国宇航学会

宇航学报

CSTPCD北大核心

影响因子：0.887

ISSN：1000-1328

年,卷(期)：2024.45(2)

参考文献量21