An Algorithm for UAV Pursuit-Evasion Game Based on MADDPG and Contrastive Learning
To solve the pursuit and evasion game problem of unmanned aerial vehicles in complex combat environments,a Markov model is established,and reward functions for both pursuer and evader are designed under the zero-sum game concept.A centralized training with distributed execution framework is constructed for multi-agent deep deterministic policy gradient(MADDPG)to solve the Nash equilibrium of the pursuit-evasion game.To address the difficult issue of analytically representing the high-dimensional capture(escape)regions characterized by initial positions of the pursuers and evaders,a deep contrastive learning algorithm based on the MADDPG network is built to indirectly represent the high-dimensional capture(escape)regions through the construction and training of Siamese Network.Simulation results show that the Nash equilibrium solution of the pursuit-evasion game of UAVs under given conditions can be gotten by the MADDPG algorithm,and the accuracy rate of representing high-dimensional capture(escape)regions achieves 95%by the combination of contrastive learning algorithm and the converged MADDPG network.