首页|对手类型未知情况下的两人零和马尔科夫博弈决策

对手类型未知情况下的两人零和马尔科夫博弈决策

扫码查看
本文研究一类典型的非完全信息博弈问题——对手类型未知的两人零和马尔科夫博弈,其中对手类型多样且每次博弈开始前无法得知对手类型。文中提出了一种基于模型的多智能体强化学习算法——对手辨识的极大极小Q学习(DOMQ)。该算法首先建立对手相关环境的经验模型,再使用经验模型学习纳什均衡策略,己方智能体在实际博弈中根据经验模型判断对手类型,从而使用相应的纳什均衡策略,以保证收益下限。本文所提的DOMQ算法只需要在采样阶段的每轮博弈结束后得知对手的类型,除此之外无需知道任何环境的信息。仿真实验验证了所提算法的有效性。
Decision making for two-player zero-sum Markov games with indistinguishable opponents
This paper investigates a typical class of incomplete information games-two-player zero-sum Markov games with indistinguishable opponents,where the opponent types are diverse and cannot be known at the beginning of the game.We propose a model-based multi-agent reinforcement learning algorithm-distinguishing opponent minimax Q-learning(DOMQ).The algorithm firstly builds an empirical model of the opponent-related environment;secondly uses the empirical model to learn a Nash equilibrium strategy,and then uses the corresponding Nash equilibrium strategy to guarantee the lower bound of the return in actual game.All the necessary information needed for the proposed DOMQ algorithm is the opponent type at the end of each episode in the sampling period rather than the other information about the environment.The simulation results verify the effectiveness of the proposed algorithm.

two-player zero-sum Markov gameincomplete informationminimax Q-learningNash equilibriummulti-agent reinforcement learning

王成意、朱进、赵云波

展开 >

中国科学技术大学信息科学技术学院,安徽 合肥 230026

两人零和马尔科夫博弈 非完全信息 极大极小Q学习 纳什均衡 多智能体强化学习

2024

控制理论与应用
华南理工大学 中国科学院数学与系统科学研究院

控制理论与应用

CSTPCD北大核心
影响因子:1.076
ISSN:1000-8152
年,卷(期):2024.41(11)