宇航学报2024,Vol.45Issue(9) :1445-1455.DOI:10.3873/j.issn.1000-1328.2024.09.010

基于无量纲模型的空地导弹强化学习制导律

Reinforcement Learning-Based Terminal Constrained Guidance Law for Air-to-Ground Missiles Based on Dimensionless Models

黄晓阳 周军 赵斌 许新鹏 沈昱恒
宇航学报2024,Vol.45Issue(9) :1445-1455.DOI:10.3873/j.issn.1000-1328.2024.09.010

基于无量纲模型的空地导弹强化学习制导律

Reinforcement Learning-Based Terminal Constrained Guidance Law for Air-to-Ground Missiles Based on Dimensionless Models

黄晓阳 1周军 1赵斌 1许新鹏 1沈昱恒2
扫码查看

作者信息

  • 1. 西北工业大学精确制导与控制研究所,西安 710072
  • 2. 上海机电工程研究所,上海 201109
  • 折叠

摘要

针对空地导弹对地打击的终端角度约束制导问题,提出了一种基于无量纲模型和终端奖励的强化学习末制导方法.首先,基于导弹飞行运动学方程建立了无量纲弹目相对运动模型,降低了强化学习环境状态空间和观测空间规模,有效提高了终端角度约束制导的强化学习网络训练效率;其次,综合考虑终端命中精度和终端攻击角度精度,不依赖过程奖励函数,构建了基于终端奖励的强化学习策略,避免了传统强化学习制导过程中存在的奖励稀疏问题;第三,采用深度确定性策略梯度算法,在典型场景下完成了考虑输入优化的末制导律训练.数学仿真实验表明,所述方法相比现有方法具有更高的命中精度和攻击角度精度,显著降低需用过载,能够有效克服现有强化学习制导方法中存在的计算资源占用高、学习效率低的问题,充分体现了其潜在的应用价值.

Abstract

To tackle the terminal angle guidance conundrum in air-to-ground missile strikes,a reinforcement learning approach based on dimensionless modeling and terminal rewards is presented.Through establishing a dimensionless model from the flight dynamics of missiles,this method shrinks the size of the state and observation space in the reinforcement learning environment,enhancing the training efficiency for angle-constrained guidance.It adopts a reinforcement strategy based on terminal rewards that takes into account the accuracy of hits and attack angles,circumventing the reward sparsity problem in conventional reinforcement learning.Utilizing the deep deterministic policy gradient algorithm,it conducts guidance law training optimized for inputs in typical scenarios.Simulation outcomes indicate that this method surpasses existing ones in the accuracy of hits and attack angles,demands less overload,and effectively resolves the issues of high computational requirements and low efficiency of current reinforcement learning guidance techniques,thereby demonstrating its practical application potential.

关键词

深度强化学习/无量纲模型/深度确定性策略梯度算法/终端奖励函数/攻击角度约束

Key words

Deep reinforcement learning(DRL)/Dimensionless model/Deep deterministic policy Gradient algorithm(DDPG)/Terminal reward function/Attack angle constraint

引用本文复制引用

基金项目

国家自然科学基金(62373307)

中央高校基本科研业务费项目(G2022KY0608)

出版年

2024
宇航学报
中国宇航学会

宇航学报

CSTPCDCSCD北大核心
影响因子:0.887
ISSN:1000-1328
浏览量1
参考文献量25
段落导航相关论文