自动化学报2024,Vol.50Issue(4) :828-840.DOI:10.16383/j.aas.c230018

基于距离信息的追逃策略:信念状态连续随机博弈

Distance Information Based Pursuit-evasion Strategy:Continuous Stochastic Game With Belief State

陈灵敏 冯宇 李永强
自动化学报2024,Vol.50Issue(4) :828-840.DOI:10.16383/j.aas.c230018

基于距离信息的追逃策略:信念状态连续随机博弈

Distance Information Based Pursuit-evasion Strategy:Continuous Stochastic Game With Belief State

陈灵敏 1冯宇 1李永强1
扫码查看

作者信息

  • 1. 浙江工业大学信息工程学院 杭州 313000
  • 折叠

摘要

追逃问题的研究在对抗、追踪以及搜查等领域极具现实意义.借助连续随机博弈与马尔科夫决策过程(Markov decision process,MDP),研究使用测量距离求解多对一追逃问题的最优策略.在此追逃问题中,追捕群体仅领导者可测量与逃逸者间的相对距离,而逃逸者具有全局视野.追逃策略求解被分为追博弈与马尔科夫决策两个过程.在求解追捕策略时,通过分割环境引入信念区域状态以估计逃逸者位置,同时使用测量距离对信念区域状态进行修正,构建起基于信念区域状态的连续随机追博弈,并借助不动点定理证明了博弈平稳纳什均衡策略的存在性.在求解逃逸策略时,逃逸者根据全局信息建立混合状态下的马尔科夫决策过程及相应的最优贝尔曼方程.同时给出了基于强化学习的平稳追逃策略求解算法,并通过案例验证了该算法的有效性.

Abstract

The pursuit-evasion problem is of great importance in the fields of confrontation,tracking and searching.In this paper,we are focused on the study of optimal strategies for solving the multi-pursuits and single-evader problem with only measured distances within the framework of continuous stochastic game and Markov decision process(MDP).In such problem,only the leader of pursuits can measure its relative distance with respect to the evader,while the evader has a global view.The strategies of the pursuits and evader are established via two steps:The pursuit game and the MDP.For the pursuits'strategy,the belief region state is introduced by partitioning the environment to estimate the evader's position,and the belief region state is further corrected by using the meas-ured distances.A continuous stochastic pursuit game is then formed based on the belief region state,and the exist-ence of stationary Nash equilibrium strategies is shown through the fixed-point theorem.For the evader's strategy,an MDP with the global states is established and the underlying optimal Bellman equation is devised.Moreover,a reinforcement learning based algorithm is presented for stationary pursuit-evasion strategies computation,and an example is also included to exhibit the effectiveness of the current method.

关键词

追逃问题/信念区域状态/连续随机博弈/马尔科夫决策过程/强化学习

Key words

Pursuit-evasion problem/belief region state/continuous stochastic game/Markov decision process(MDP)/reinforcement learning

引用本文复制引用

基金项目

国家自然科学基金(61973276)

国家自然科学基金(62073294)

浙江省自然科学基金(LZ21F030003)

出版年

2024
自动化学报
中国自动化学会 中国科学院自动化研究所

自动化学报

CSTPCD北大核心
影响因子:1.762
ISSN:0254-4156
被引量1
参考文献量42
段落导航相关论文