Autonomous Mission Planning of Collaborative Observation for Moving Targets Based on Reinforcement Learning
扫码查看
点击上方二维码区域,可以放大扫码查看
原文链接
维普
万方数据
随着空间目标的数量逐渐增多、空中目标动态性日趋提升,对目标的观测定位问题变得愈发重要.由于需同时观测的目标多且目标动态性强,而星座观测资源有限,为了更高效地调用星座观测资源,需要动态调整多目标协同观测方案,使各目标均具有较好的定位精度,因此需解决星座协同观测多目标的任务规划问题.建立星座姿态轨道模型、目标飞行模型、目标协同探测及定位模型,提出基于几何精度衰减因子(geo-metric dilution of precision,GDOP)的目标观测定位误差预估模型及目标观测优先级模型,建立基于强化学习的协同观测任务规划框架,采用多头自注意力机制建立策略网络,以及近端策略优化算法开展任务规划算法训练.仿真验证论文提出的方法相比传统启发式方法提升了多目标观测精度和有效跟踪时间,相比遗传算法具有更快的计算速度.
With the increasing number of space targets,the problem of orbit determination of the targets is becom-ing increasingly important for space security.Due to the large number and dynamic feature of the space targets that need to be observed,coupled with limited observation resources,it is necessary to dynamically adjust the collabora-tive observation scheme to efficiently utilize constellation observation resources and ensure that each target has better positioning accuracy.Thus,it is required to solve the mission planning problem of multiple targets using multiple observation satellites.This paper first establishes the orbit dynamic model of the flying targets,as well as the Kal-man filter model of the collaborative positioning algorithm using the multiple line of sight information of different ob-servation satellites.Then,a collaborative positioning accuracy estimation model and an observation priority model of the targets based on the Geometric Dilution of Precision(GDOP)is proposed.Based on the above models,a mis-sion planning framework for collaborative observation based on reinforcement learning(RL)is developed.A policy network based on multi-head self-attention mechanism is designed accordingly to calculate the planning results.The proximal policy optimization(PPO)algorithm is adopted to train the policy network in a training environment.Compared with the heuristic algorithm based on tracking priority,simulation results shows that the proposed RL method can effectively improve the overall tracking accuracy as well as the total tracking time of all the targets,and can provide faster computation speed compared to genetic algorithms.