首页|基于自适应增强随机搜索的航天器追逃博弈策略研究

基于自适应增强随机搜索的航天器追逃博弈策略研究

扫码查看
针对航天器与非合作目标追逃博弈的生存型微分对策拦截问题,基于强化学习研究了追逃博弈策略,提出了自适应增强随机搜索(adaptive-augmented random search,A-ARS)算法.针对序贯决策的稀疏奖励难题,设计了基于策略参数空间扰动的探索方法,加快策略收敛速度;针对可能过早陷入局部最优问题设计了新颖度函数并引导策略更新,可提升数据利用效率;通过数值仿真验证并与增强随机搜索(augmented random search,ARS)、近端策略优化算法(proximal policy optimization,PPO)以及深度确定性策略梯度下降算法(deep deterministic policy gradient,DDPG)进行对比,验证了此方法的有效性和先进性.
Research on game strategy of spacecraft chase and escape based on adaptive augmented random search
To solve the problem of the survival differential policy interception between a spacecraft and a non-coop-erative target pursuit game,the pursuit game policy is studied based on reinforcement learning,and the adaptive-augmented random search algorithm is proposed.Firstly,to solve the sparse reward problem of sequential decision making,an exploration method based on the spatial perturbation of parameters of the policy is designed,thus accel-erating its convergence speed.Secondly,to avoid the possibility of falling into local optimum prematurely,a novelty degree function is designed to guide the policy update,enhancing the efficiency of data utilization.Finally,the ef-fectiveness and advancement of the exploration method are verified with numerical simulations and compared with those of the augmented random search algorithm,the proximal policy optimization algorithm and the deep determin-istic policy gradient algorithm.

non-cooperative targetpursuit gamedifferential game theoryreinforcement learningsparse reward

焦杰、苟永杰、吴文博、泮斌峰

展开 >

西北工业大学 航天学院,陕西 西安 710072

航天飞行动力学技术国家级重点实验室,陕西 西安 710072

上海宇航系统工程研究所,上海 201108

非合作目标 追逃博弈 微分对策 强化学习 稀疏奖励

2024

西北工业大学学报
西北工业大学

西北工业大学学报

CSTPCD北大核心
影响因子:0.496
ISSN:1000-2758
年,卷(期):2024.42(1)
  • 23