基于行为树的多星轨道追逃博弈方法
Multi-Satellite Orbital Pursuit-Evasion Game Method Based on Behavior Trees
苏浩 1季明江 1郭鹏宇 1曹璐1
作者信息
- 1. 军事科学院国防科技创新研究院,北京 100071;智能博弈与决策实验室,北京 100000
- 折叠
摘要
多智能体强化学习是解决空间追逃博弈问题的一类有效方法,但在多星追逃博弈场景下存在复杂性高、训练时间长、难以收敛等问题.本文提出一种基于行为树的多星轨道追逃博弈方法,将对多个目标的复杂追逃博弈问题分解为对单一目标的追逃博弈问题.利用行为树构建多星追逃任务分配与博弈决策框架,以最大化追击成功概率为目标建立最优任务分配模型,并利用遗传算法进行求解,实现多星追逃任务快速分解;对于分配的追击任务,各卫星自主选择多智能体深度确定性策略梯度算法训练得到的博弈策略开展博弈决策.结果表明,本文所提方法能将多星轨道博弈任务有效分解,并在行为树的驱动下成功完成对目标的追击.
Abstract
Multi-agent reinforcement learning is an effective approach for solving spatial pursuit-evasion games.In multi-star pursuit-evasion scenarios,however,there are challenges such as long training time and difficulty in convergence.This paper propo-ses a multi-star orbital pursuit-evasion method based on behavior trees,decomposing the complex pursuit-evasion problem invol-ving multiple targets into individual pursuit-evasion problems for each target.By utilizing behavior trees to construct the framework for task allocation and game decision-making in multi-star pursuit-evasion scenarios,the optimal task allocation model is estab-lished with the objective of maximizing the probability of successful pursuit.Genetic algorithms are employed for solving,enabling rapid decomposition of multi-star pursuit-evasion tasks.For the allocated pursuit tasks,each satellite autonomously selects game strategies obtained through training with the Multi-Agent Deep Deterministic Policy Gradient algorithm.The results demonstrate that the proposed method effectively decomposes multi-star orbital game tasks and successfully achieves target pursuit under the guidance of behavior trees.
关键词
多星轨道追逃博弈/行为树/任务分配/多智能体强化学习Key words
multi-satellite orbital pursuit-evasion game/behavior tree/task allocation/multi-agent reinforcement learning引用本文复制引用
出版年
2024