首页|基于蒙特卡罗策略梯度的雷达观测器轨迹规划

基于蒙特卡罗策略梯度的雷达观测器轨迹规划

扫码查看
在目标跟踪过程的雷达观测器轨迹规划(OTP)中,针对马尔可夫步进规划智能决策问题,在离散动作空间上,提出了一种基于蒙特卡罗策略梯度(MCPG)算法的雷达轨迹规划方法.首先,联合目标跟踪状态、奖励机制、动作方案和雷达观测器位置,将OTP过程建模为一个连续的马尔可夫决策过程(MDP),提出基于MCPG的全局智能规划方法.其次,将跟踪幕长内的每个时间步作为单独一幕来进行策略更新,提出基于MCPG目标跟踪中观测器轨迹的步进智能规划方法,井深入研究目标的跟踪估计特性,构造以跟踪性能优化为目的的奖励函数.最后,对最优非线性目标跟踪过程中基于强化学习的智能OTP决策仿真实验,表明了所提方法的有效性.
Trajectory planning of radar observer based on Monte Carlo policy gradient
In the radar observer trajectory planning(OTP)of the target tracking process,for the intelli-gent decision-making problem of Markov stepping planning,a radar trajectory planning method based on the Monte Carlo policy gradient(MCPG)algorithm is proposed in the discrete action space.First,the OTP process is modeled as a continuous Markov decision process(MDP)by combining the target tracking state,reward mechanism,action plan,and radar observer position.A global intelligent planning method based on MCPG is then proposed.Next,by considering each time step in the tracking episode length as a separate episode for policy updates,a step-wise intelligent planning method based on the observer trajecto-ry in MCPG target tracking is proposed.Then,the tracking estimation characteristics of the target are deeply studied,and a reward function for the purpose of tracking performance optimization is constructed.Finally,the simulation experiment of the intelligent OTP decision-making based on reinforcement learning in the optimal nonlinear target tracking shows the effectiveness of the proposed method.

target trackingradar observer trajectory planningpolicy gradientreward function

陈辉、王荆宇、张文旭、赵永红、席磊

展开 >

兰州理工大学电气工程与信息工程学院,甘肃兰州 730050

甘肃长风电子科技有限责任公司,甘肃兰州 730070

甘肃省科学院自动化研究所,甘肃 兰州 730000

目标跟踪 雷达观测器轨迹规划 策略梯度 奖励函数

国家自然科学基金国家自然科学基金国家自然科学基金国家自然科学基金甘肃省科学院重大专项项目2023年甘肃省军民融合发展专项资金项目甘肃省2024年度重点人才项目

621630236236603162363023618731162023ZDZX-03

2024

兰州理工大学学报
兰州理工大学

兰州理工大学学报

CSTPCD北大核心
影响因子:0.57
ISSN:1673-5196
年,卷(期):2024.50(5)