首页|基于距离信息的追逃策略:信念状态连续随机博弈

基于距离信息的追逃策略:信念状态连续随机博弈

扫码查看
追逃问题的研究在对抗、追踪以及搜查等领域极具现实意义.借助连续随机博弈与马尔科夫决策过程(Markov decision process,MDP),研究使用测量距离求解多对一追逃问题的最优策略.在此追逃问题中,追捕群体仅领导者可测量与逃逸者间的相对距离,而逃逸者具有全局视野.追逃策略求解被分为追博弈与马尔科夫决策两个过程.在求解追捕策略时,通过分割环境引入信念区域状态以估计逃逸者位置,同时使用测量距离对信念区域状态进行修正,构建起基于信念区域状态的连续随机追博弈,并借助不动点定理证明了博弈平稳纳什均衡策略的存在性.在求解逃逸策略时,逃逸者根据全局信息建立混合状态下的马尔科夫决策过程及相应的最优贝尔曼方程.同时给出了基于强化学习的平稳追逃策略求解算法,并通过案例验证了该算法的有效性.
Distance Information Based Pursuit-evasion Strategy:Continuous Stochastic Game With Belief State
The pursuit-evasion problem is of great importance in the fields of confrontation,tracking and searching.In this paper,we are focused on the study of optimal strategies for solving the multi-pursuits and single-evader problem with only measured distances within the framework of continuous stochastic game and Markov decision process(MDP).In such problem,only the leader of pursuits can measure its relative distance with respect to the evader,while the evader has a global view.The strategies of the pursuits and evader are established via two steps:The pursuit game and the MDP.For the pursuits'strategy,the belief region state is introduced by partitioning the environment to estimate the evader's position,and the belief region state is further corrected by using the meas-ured distances.A continuous stochastic pursuit game is then formed based on the belief region state,and the exist-ence of stationary Nash equilibrium strategies is shown through the fixed-point theorem.For the evader's strategy,an MDP with the global states is established and the underlying optimal Bellman equation is devised.Moreover,a reinforcement learning based algorithm is presented for stationary pursuit-evasion strategies computation,and an example is also included to exhibit the effectiveness of the current method.

Pursuit-evasion problembelief region statecontinuous stochastic gameMarkov decision process(MDP)reinforcement learning

陈灵敏、冯宇、李永强

展开 >

浙江工业大学信息工程学院 杭州 313000

追逃问题 信念区域状态 连续随机博弈 马尔科夫决策过程 强化学习

国家自然科学基金国家自然科学基金浙江省自然科学基金

6197327662073294LZ21F030003

2024

自动化学报
中国自动化学会 中国科学院自动化研究所

自动化学报

CSTPCD北大核心
影响因子:1.762
ISSN:0254-4156
年,卷(期):2024.50(4)
  • 1
  • 42