首页|Efficiently tracking multi-strategic opponents: A context-aware Bayesian policy reuse approach

Efficiently tracking multi-strategic opponents: A context-aware Bayesian policy reuse approach

扫码查看
In Markov games, accurately detecting opponent policies and reusing optimal response policies is still a challenging problem. Most previous works assume that opponents switch their policies infrequently only at the end of an episode. However, the opponents may change their policies at high-frequency or even within an episode. Besides, the agent may achieve inconsistent optimal returns because of different opponent behaviors, which brings greater challenges to policy detection. This paper studies how to deal with the non-stationary opponent with abrupt policy changes through accurate policy detection and direct policy reuse. Specifically, we propose a context-aware Bayesian policy reuse (CABPR) algorithm to accurately identify and track the multi-strategic opponent. To continuously infer the opponent policy, an intra-episode belief is introduced taking advantage of opponent models. Within an episode, an inter-episode belief using Bayesian inference and the intra-episode belief are jointly used to detect the opponent type based on its behaviors and episodic rewards. Then the agent reuses the best response policies accordingly. We demonstrate the advantages of the proposed algorithm over several state-of-the-art algorithms in terms of episodic rewards, accumulated rewards, and detection accuracy in four competitive scenarios.

Bayesian policy reuseMarkov gamesMulti-strategic opponentOpponent model

Chen H.、Liu Q.、Huang J.、Fu K.

展开 >

College of Intelligence Science and Technology National University of Defense Technology Changsha

2022

Applied Soft Computing

Applied Soft Computing

EISCI
ISSN:1568-4946
年,卷(期):2022.121
  • 62