Trajectory planning of radar observer based on Monte Carlo policy gradient
In the radar observer trajectory planning(OTP)of the target tracking process,for the intelli-gent decision-making problem of Markov stepping planning,a radar trajectory planning method based on the Monte Carlo policy gradient(MCPG)algorithm is proposed in the discrete action space.First,the OTP process is modeled as a continuous Markov decision process(MDP)by combining the target tracking state,reward mechanism,action plan,and radar observer position.A global intelligent planning method based on MCPG is then proposed.Next,by considering each time step in the tracking episode length as a separate episode for policy updates,a step-wise intelligent planning method based on the observer trajecto-ry in MCPG target tracking is proposed.Then,the tracking estimation characteristics of the target are deeply studied,and a reward function for the purpose of tracking performance optimization is constructed.Finally,the simulation experiment of the intelligent OTP decision-making based on reinforcement learning in the optimal nonlinear target tracking shows the effectiveness of the proposed method.
target trackingradar observer trajectory planningpolicy gradientreward function