Efficiently tracking multi-strategic opponents: A context-aware Bayesian policy reuse approach

扫码查看

原文链接

NSTL
Elsevier

外文摘要：In Markov games, accurately detecting opponent policies and reusing optimal response policies is still a challenging problem. Most previous works assume that opponents switch their policies infrequently only at the end of an episode. However, the opponents may change their policies at high-frequency or even within an episode. Besides, the agent may achieve inconsistent optimal returns because of different opponent behaviors, which brings greater challenges to policy detection. This paper studies how to deal with the non-stationary opponent with abrupt policy changes through accurate policy detection and direct policy reuse. Specifically, we propose a context-aware Bayesian policy reuse (CABPR) algorithm to accurately identify and track the multi-strategic opponent. To continuously infer the opponent policy, an intra-episode belief is introduced taking advantage of opponent models. Within an episode, an inter-episode belief using Bayesian inference and the intra-episode belief are jointly used to detect the opponent type based on its behaviors and episodic rewards. Then the agent reuses the best response policies accordingly. We demonstrate the advantages of the proposed algorithm over several state-of-the-art algorithms in terms of episodic rewards, accumulated rewards, and detection accuracy in four competitive scenarios.

外文关键词：

Bayesian policy reuseMarkov gamesMulti-strategic opponentOpponent model

作者：

Chen H.、Liu Q.、Huang J.、Fu K.

展开 >

作者单位：

College of Intelligence Science and Technology National University of Defense Technology Changsha

出版年：

2022

DOI：

10.1016/j.asoc.2022.108715

Applied Soft Computing

EISCI

ISSN：1568-4946

年,卷(期)：2022.121

参考文献量62