回合制轨道博弈中MCTS算法的改进与应用

扫码查看

原文链接

国家科技期刊平台
NETL
NSTL
万方数据

中文摘要：航天器回合制追逃博弈中的变轨感知延迟使得微分对策法求解困难,基于深度强化学习的博弈算法可解释性弱,在工程上的运用仍存在风险.针对航天器回合制追逃博弈问题,提出了一种预测价值积累的蒙特卡洛树搜索(PVA-MCTS)算法.该算法基于航天器轨道运动的可预知性,对博弈过程中的决策价值进行预测并积累,解决了航天器回合制追逃博弈奖励稀疏、时间跨度大的问题,采用的自适应扩展方法提升了学习效率.将其用于求解航天器回合制追逃博弈问题,并与蒙特卡洛树搜索(MCTS)算法求解得到的结果对比,结果表明PVA-MCTS算法对追踪航天器和逃逸航天器分别有约27.6%的追捕用时缩短和约6.8%的逃逸时间延长.该算法的提出可加快推进后续轨道博弈技术在非合作目标接近、碰撞规避等领域应用的落实落地.

外文标题：Improvement and application of MCTS in turn-based orbital games

外文摘要：The sensing delay of orbit change in turn-based orbit pursuit-evasion game brings difficulties to differential game approaches,and deep reinforcement learning-based algorithms are still risky for engineering applications due to the inexplicability.The predictive-value-accumulate Monte Carlo tree search(PVA-MCTS)algorithm is proposed for the turn-based orbit pursuit-evasion game.Based on the predictability of spacecraft orbital motion,the algorithm predicts and accumulates the decision value in the game.This solves the problem of sparse reward and large time span in the turn-based orbit pursuit-evasion game,and improves the learning efficiency.It is used to solve the turn-based orbit pursuit-evasion game,and compared with the results obtained by Monte Carlo tree search(MCTS)algorithm.The results show that the PVA-MCTS algorithm reduces the pursuit time by about 27.6%and increases the escape time by about 6.8%for pursuer and evader respectively.The PVA-MCTS algorithm is realistic for the application of orbital game in the fields of non-cooperative target approaching and collision avoidance.

外文关键词：

pursuit-evasion of spacecraftturn-based pursuit-evasion gameMonte Carlo tree searchsensing delay of orbit changepredictive value accumulate

作者：

郑鑫宇、张轶、周杰、唐佩佳、彭升人、党朝辉

展开 >

作者单位：

中国空间技术研究院钱学森空间技术实验室,北京 100094

西北工业大学航天学院,西安 710072

关键词：

航天器追逃回合制追逃博弈蒙特卡洛树搜索变轨感知延迟预测价值积累

基金：

国家自然科学基金

项目编号：

12172288

出版年：

2024

DOI：

10.16708/j.cnki.1000-758X.2024.0075

中国空间科学技术

中国空间技术研究院

中国空间科学技术

CSTPCD北大核心

影响因子：0.404

ISSN：1000-758X

年,卷(期)：2024.44(5)