一种超参数自适应航天器交会变轨策略优化方法

An Adaptive Hyperparameter Strategy Optimization Method for Spacecraft Rendezvous and Orbital Transfer

孙雷翔 ¹郭延宁 ²邓武东 ³吕跃勇 ²马广富²

扫码查看

作者信息

1. 哈尔滨工业大学(深圳)空间科学与应用技术研究院,深圳 518055
2. 哈尔滨工业大学航天学院,哈尔滨 150001
3. 上海卫星工程研究所,上海 201109
折叠

摘要

利用强化学习技术,本文提出了一种超参数自适应的燃料最优地球同步轨道(GEO)航天器交会变轨策略优化方法.首先,建立了GEO航天器交会Lambert变轨模型.以变轨时刻为决策变量、燃料消耗为适应度函数,使用改进式综合学习粒子群算法(ICLPSO)作为变轨策略优化的基础方法.其次,考虑到求解的最优性和快速性,重新设计了以粒子群算法(PSO)优化结果为参考基线的奖励函数.使用一族典型GEO航天器交会工况训练深度确定性策略梯度神经网络(DDPG).将DDPG与ICLPSO组合为强化学习粒子群算法(RLPSO),从而实现算法超参数根据实时迭代收敛情况的自适应动态调整.最后,仿真结果表明与PSO、综合学习粒子群算法(CLPSO)相比,RLPSO在较少迭代后即可给出适应度较高的规划结果,减轻了迭代过程中的计算资源消耗.

Abstract

Based on reinforcement learning(RL),an optimization method of rendezvous and orbit change strategy for fuel optimal geosynchronous orbit(GEO)spacecrafts with hyperparameter adaptation is proposed.Firstly,a GEO spacecraft rendezvous Lambert trajectory model is established.Taking the trajectory time as the decision variable and fuel consumption as the fitness function,an improved comprehensive learning particle swarm algorithm(ICLPSO)is used as the basic method for trajectory strategy optimization.Secondly,considering the optimality and rapidity of the solution,a reward function is redesigned with the particle swarm algorithm(PSO)optimization result as the reference baseline.A deep deterministic policy gradient neural network(DDPG)is trained using a typical family of GEO spacecraft rendezvous conditions.DDPG is combined with ICLPSO to form a reinforcement learning particle swarm algorithm(RLPSO),which realizes the adaptive dynamic adjustment of algorithm hyperparameters according to the real-time iterative convergence situation.Finally,simulation results show that compared with PSO and comprehensive learning particle swarm algorithm(CLPSO),RLPSO can give planning results with higher fitness after fewer iterations,reducing computational resource consumption during the iteration process.

关键词

地球同步轨道/Lambert变轨/强化学习/粒子群算法/深度确定性策略梯度

Key words

Geosynchronous orbit/Lambert transfer/Reinforced learning/Particle swarm optimization/Deep deterministic policy gradient

引用本文复制引用

基金项目

国家自然科学基金(62273118)

国家自然科学基金(61876050)

国家自然科学基金(61973100)

出版年

2024

宇航学报

中国宇航学会

宇航学报

CSTPCD北大核心

影响因子：0.887

ISSN：1000-1328

参考文献量25

段落导航