首页|一种超参数自适应航天器交会变轨策略优化方法

一种超参数自适应航天器交会变轨策略优化方法

扫码查看
利用强化学习技术,本文提出了一种超参数自适应的燃料最优地球同步轨道(GEO)航天器交会变轨策略优化方法.首先,建立了GEO航天器交会Lambert变轨模型.以变轨时刻为决策变量、燃料消耗为适应度函数,使用改进式综合学习粒子群算法(ICLPSO)作为变轨策略优化的基础方法.其次,考虑到求解的最优性和快速性,重新设计了以粒子群算法(PSO)优化结果为参考基线的奖励函数.使用一族典型GEO航天器交会工况训练深度确定性策略梯度神经网络(DDPG).将DDPG与ICLPSO组合为强化学习粒子群算法(RLPSO),从而实现算法超参数根据实时迭代收敛情况的自适应动态调整.最后,仿真结果表明与PSO、综合学习粒子群算法(CLPSO)相比,RLPSO在较少迭代后即可给出适应度较高的规划结果,减轻了迭代过程中的计算资源消耗.
An Adaptive Hyperparameter Strategy Optimization Method for Spacecraft Rendezvous and Orbital Transfer
Based on reinforcement learning(RL),an optimization method of rendezvous and orbit change strategy for fuel optimal geosynchronous orbit(GEO)spacecrafts with hyperparameter adaptation is proposed.Firstly,a GEO spacecraft rendezvous Lambert trajectory model is established.Taking the trajectory time as the decision variable and fuel consumption as the fitness function,an improved comprehensive learning particle swarm algorithm(ICLPSO)is used as the basic method for trajectory strategy optimization.Secondly,considering the optimality and rapidity of the solution,a reward function is redesigned with the particle swarm algorithm(PSO)optimization result as the reference baseline.A deep deterministic policy gradient neural network(DDPG)is trained using a typical family of GEO spacecraft rendezvous conditions.DDPG is combined with ICLPSO to form a reinforcement learning particle swarm algorithm(RLPSO),which realizes the adaptive dynamic adjustment of algorithm hyperparameters according to the real-time iterative convergence situation.Finally,simulation results show that compared with PSO and comprehensive learning particle swarm algorithm(CLPSO),RLPSO can give planning results with higher fitness after fewer iterations,reducing computational resource consumption during the iteration process.

Geosynchronous orbitLambert transferReinforced learningParticle swarm optimizationDeep deterministic policy gradient

孙雷翔、郭延宁、邓武东、吕跃勇、马广富

展开 >

哈尔滨工业大学(深圳)空间科学与应用技术研究院,深圳 518055

哈尔滨工业大学航天学院,哈尔滨 150001

上海卫星工程研究所,上海 201109

地球同步轨道 Lambert变轨 强化学习 粒子群算法 深度确定性策略梯度

国家自然科学基金国家自然科学基金国家自然科学基金

622731186187605061973100

2024

宇航学报
中国宇航学会

宇航学报

CSTPCD北大核心
影响因子:0.887
ISSN:1000-1328
年,卷(期):2024.45(1)
  • 25