基于强化学习的动目标协同观测任务自主规划方法

Autonomous Mission Planning of Collaborative Observation for Moving Targets Based on Reinforcement Learning

扫码查看

原文链接

维普
万方数据

中文摘要：随着空间目标的数量逐渐增多、空中目标动态性日趋提升,对目标的观测定位问题变得愈发重要.由于需同时观测的目标多且目标动态性强,而星座观测资源有限,为了更高效地调用星座观测资源,需要动态调整多目标协同观测方案,使各目标均具有较好的定位精度,因此需解决星座协同观测多目标的任务规划问题.建立星座姿态轨道模型、目标飞行模型、目标协同探测及定位模型,提出基于几何精度衰减因子(geo-metric dilution of precision,GDOP)的目标观测定位误差预估模型及目标观测优先级模型,建立基于强化学习的协同观测任务规划框架,采用多头自注意力机制建立策略网络,以及近端策略优化算法开展任务规划算法训练.仿真验证论文提出的方法相比传统启发式方法提升了多目标观测精度和有效跟踪时间,相比遗传算法具有更快的计算速度.

外文摘要：With the increasing number of space targets,the problem of orbit determination of the targets is becom-ing increasingly important for space security.Due to the large number and dynamic feature of the space targets that need to be observed,coupled with limited observation resources,it is necessary to dynamically adjust the collabora-tive observation scheme to efficiently utilize constellation observation resources and ensure that each target has better positioning accuracy.Thus,it is required to solve the mission planning problem of multiple targets using multiple observation satellites.This paper first establishes the orbit dynamic model of the flying targets,as well as the Kal-man filter model of the collaborative positioning algorithm using the multiple line of sight information of different ob-servation satellites.Then,a collaborative positioning accuracy estimation model and an observation priority model of the targets based on the Geometric Dilution of Precision(GDOP)is proposed.Based on the above models,a mis-sion planning framework for collaborative observation based on reinforcement learning(RL)is developed.A policy network based on multi-head self-attention mechanism is designed accordingly to calculate the planning results.The proximal policy optimization(PPO)algorithm is adopted to train the policy network in a training environment.Compared with the heuristic algorithm based on tracking priority,simulation results shows that the proposed RL method can effectively improve the overall tracking accuracy as well as the total tracking time of all the targets,and can provide faster computation speed compared to genetic algorithms.

外文关键词：

multiple targetscollaborative observingmission planningreinforcement learningself-attention mechanismproximal policy optimization algorithm

作者：

刘一隆、张聪、张斯航、陈砺寒

展开 >

作者单位：

北京控制工程研究所,北京 100094

关键词：

多目标协同观测任务规划强化学习自注意力机制近端策略优化

基金：

国家自然科学基金资助项目国家自然科学基金资助项目

项目编号：

U21B60000562303048

出版年：

2024

DOI：

10.3969/j.issn.1674-1579.2024.03.005

空间控制技术与应用

北京控制工程研究所

空间控制技术与应用

CSTPCD北大核心

影响因子：0.267

ISSN：1674-1579

年,卷(期)：2024.50(3)