首页|基于无量纲模型的空地导弹强化学习制导律

基于无量纲模型的空地导弹强化学习制导律

扫码查看
针对空地导弹对地打击的终端角度约束制导问题,提出了一种基于无量纲模型和终端奖励的强化学习末制导方法.首先,基于导弹飞行运动学方程建立了无量纲弹目相对运动模型,降低了强化学习环境状态空间和观测空间规模,有效提高了终端角度约束制导的强化学习网络训练效率;其次,综合考虑终端命中精度和终端攻击角度精度,不依赖过程奖励函数,构建了基于终端奖励的强化学习策略,避免了传统强化学习制导过程中存在的奖励稀疏问题;第三,采用深度确定性策略梯度算法,在典型场景下完成了考虑输入优化的末制导律训练.数学仿真实验表明,所述方法相比现有方法具有更高的命中精度和攻击角度精度,显著降低需用过载,能够有效克服现有强化学习制导方法中存在的计算资源占用高、学习效率低的问题,充分体现了其潜在的应用价值.
Reinforcement Learning-Based Terminal Constrained Guidance Law for Air-to-Ground Missiles Based on Dimensionless Models
To tackle the terminal angle guidance conundrum in air-to-ground missile strikes,a reinforcement learning approach based on dimensionless modeling and terminal rewards is presented.Through establishing a dimensionless model from the flight dynamics of missiles,this method shrinks the size of the state and observation space in the reinforcement learning environment,enhancing the training efficiency for angle-constrained guidance.It adopts a reinforcement strategy based on terminal rewards that takes into account the accuracy of hits and attack angles,circumventing the reward sparsity problem in conventional reinforcement learning.Utilizing the deep deterministic policy gradient algorithm,it conducts guidance law training optimized for inputs in typical scenarios.Simulation outcomes indicate that this method surpasses existing ones in the accuracy of hits and attack angles,demands less overload,and effectively resolves the issues of high computational requirements and low efficiency of current reinforcement learning guidance techniques,thereby demonstrating its practical application potential.

Deep reinforcement learning(DRL)Dimensionless modelDeep deterministic policy Gradient algorithm(DDPG)Terminal reward functionAttack angle constraint

黄晓阳、周军、赵斌、许新鹏、沈昱恒

展开 >

西北工业大学精确制导与控制研究所,西安 710072

上海机电工程研究所,上海 201109

深度强化学习 无量纲模型 深度确定性策略梯度算法 终端奖励函数 攻击角度约束

国家自然科学基金中央高校基本科研业务费项目

62373307G2022KY0608

2024

宇航学报
中国宇航学会

宇航学报

CSTPCD北大核心
影响因子:0.887
ISSN:1000-1328
年,卷(期):2024.45(9)
  • 1