Decision making for two-player zero-sum Markov games with indistinguishable opponents
This paper investigates a typical class of incomplete information games-two-player zero-sum Markov games with indistinguishable opponents,where the opponent types are diverse and cannot be known at the beginning of the game.We propose a model-based multi-agent reinforcement learning algorithm-distinguishing opponent minimax Q-learning(DOMQ).The algorithm firstly builds an empirical model of the opponent-related environment;secondly uses the empirical model to learn a Nash equilibrium strategy,and then uses the corresponding Nash equilibrium strategy to guarantee the lower bound of the return in actual game.All the necessary information needed for the proposed DOMQ algorithm is the opponent type at the end of each episode in the sampling period rather than the other information about the environment.The simulation results verify the effectiveness of the proposed algorithm.