西北工业大学学报2024,Vol.42Issue(1) :117-128.DOI:10.1051/jnwpu/20244210117

基于自适应增强随机搜索的航天器追逃博弈策略研究

Research on game strategy of spacecraft chase and escape based on adaptive augmented random search

焦杰 苟永杰 吴文博 泮斌峰
西北工业大学学报2024,Vol.42Issue(1) :117-128.DOI:10.1051/jnwpu/20244210117

基于自适应增强随机搜索的航天器追逃博弈策略研究

Research on game strategy of spacecraft chase and escape based on adaptive augmented random search

焦杰 1苟永杰 2吴文博 1泮斌峰1
扫码查看

作者信息

  • 1. 西北工业大学 航天学院,陕西 西安 710072;航天飞行动力学技术国家级重点实验室,陕西 西安 710072
  • 2. 上海宇航系统工程研究所,上海 201108
  • 折叠

摘要

针对航天器与非合作目标追逃博弈的生存型微分对策拦截问题,基于强化学习研究了追逃博弈策略,提出了自适应增强随机搜索(adaptive-augmented random search,A-ARS)算法.针对序贯决策的稀疏奖励难题,设计了基于策略参数空间扰动的探索方法,加快策略收敛速度;针对可能过早陷入局部最优问题设计了新颖度函数并引导策略更新,可提升数据利用效率;通过数值仿真验证并与增强随机搜索(augmented random search,ARS)、近端策略优化算法(proximal policy optimization,PPO)以及深度确定性策略梯度下降算法(deep deterministic policy gradient,DDPG)进行对比,验证了此方法的有效性和先进性.

Abstract

To solve the problem of the survival differential policy interception between a spacecraft and a non-coop-erative target pursuit game,the pursuit game policy is studied based on reinforcement learning,and the adaptive-augmented random search algorithm is proposed.Firstly,to solve the sparse reward problem of sequential decision making,an exploration method based on the spatial perturbation of parameters of the policy is designed,thus accel-erating its convergence speed.Secondly,to avoid the possibility of falling into local optimum prematurely,a novelty degree function is designed to guide the policy update,enhancing the efficiency of data utilization.Finally,the ef-fectiveness and advancement of the exploration method are verified with numerical simulations and compared with those of the augmented random search algorithm,the proximal policy optimization algorithm and the deep determin-istic policy gradient algorithm.

关键词

非合作目标/追逃博弈/微分对策/强化学习/稀疏奖励

Key words

non-cooperative target/pursuit game/differential game theory/reinforcement learning/sparse reward

引用本文复制引用

出版年

2024
西北工业大学学报
西北工业大学

西北工业大学学报

CSTPCD北大核心
影响因子:0.496
ISSN:1000-2758
参考文献量23
段落导航相关论文