The pursuit-evasion problem is of great importance in the fields of confrontation,tracking and searching.In this paper,we are focused on the study of optimal strategies for solving the multi-pursuits and single-evader problem with only measured distances within the framework of continuous stochastic game and Markov decision process(MDP).In such problem,only the leader of pursuits can measure its relative distance with respect to the evader,while the evader has a global view.The strategies of the pursuits and evader are established via two steps:The pursuit game and the MDP.For the pursuits'strategy,the belief region state is introduced by partitioning the environment to estimate the evader's position,and the belief region state is further corrected by using the meas-ured distances.A continuous stochastic pursuit game is then formed based on the belief region state,and the exist-ence of stationary Nash equilibrium strategies is shown through the fixed-point theorem.For the evader's strategy,an MDP with the global states is established and the underlying optimal Bellman equation is devised.Moreover,a reinforcement learning based algorithm is presented for stationary pursuit-evasion strategies computation,and an example is also included to exhibit the effectiveness of the current method.
关键词
追逃问题/信念区域状态/连续随机博弈/马尔科夫决策过程/强化学习
Key words
Pursuit-evasion problem/belief region state/continuous stochastic game/Markov decision process(MDP)/reinforcement learning