Improvement and application of MCTS in turn-based orbital games
The sensing delay of orbit change in turn-based orbit pursuit-evasion game brings difficulties to differential game approaches,and deep reinforcement learning-based algorithms are still risky for engineering applications due to the inexplicability.The predictive-value-accumulate Monte Carlo tree search(PVA-MCTS)algorithm is proposed for the turn-based orbit pursuit-evasion game.Based on the predictability of spacecraft orbital motion,the algorithm predicts and accumulates the decision value in the game.This solves the problem of sparse reward and large time span in the turn-based orbit pursuit-evasion game,and improves the learning efficiency.It is used to solve the turn-based orbit pursuit-evasion game,and compared with the results obtained by Monte Carlo tree search(MCTS)algorithm.The results show that the PVA-MCTS algorithm reduces the pursuit time by about 27.6%and increases the escape time by about 6.8%for pursuer and evader respectively.The PVA-MCTS algorithm is realistic for the application of orbital game in the fields of non-cooperative target approaching and collision avoidance.
pursuit-evasion of spacecraftturn-based pursuit-evasion gameMonte Carlo tree searchsensing delay of orbit changepredictive value accumulate