对手类型未知情况下的两人零和马尔科夫博弈决策

扫码查看

原文链接

万方数据
维普

中文摘要：本文研究一类典型的非完全信息博弈问题——对手类型未知的两人零和马尔科夫博弈,其中对手类型多样且每次博弈开始前无法得知对手类型.文中提出了一种基于模型的多智能体强化学习算法——对手辨识的极大极小Q学习(DOMQ).该算法首先建立对手相关环境的经验模型,再使用经验模型学习纳什均衡策略,己方智能体在实际博弈中根据经验模型判断对手类型,从而使用相应的纳什均衡策略,以保证收益下限.本文所提的DOMQ算法只需要在采样阶段的每轮博弈结束后得知对手的类型,除此之外无需知道任何环境的信息.仿真实验验证了所提算法的有效性.

外文标题：Decision making for two-player zero-sum Markov games with indistinguishable opponents

外文摘要：This paper investigates a typical class of incomplete information games-two-player zero-sum Markov games with indistinguishable opponents,where the opponent types are diverse and cannot be known at the beginning of the game.We propose a model-based multi-agent reinforcement learning algorithm-distinguishing opponent minimax Q-learning(DOMQ).The algorithm firstly builds an empirical model of the opponent-related environment;secondly uses the empirical model to learn a Nash equilibrium strategy,and then uses the corresponding Nash equilibrium strategy to guarantee the lower bound of the return in actual game.All the necessary information needed for the proposed DOMQ algorithm is the opponent type at the end of each episode in the sampling period rather than the other information about the environment.The simulation results verify the effectiveness of the proposed algorithm.

外文关键词：

two-player zero-sum Markov gameincomplete informationminimax Q-learningNash equilibriummulti-agent reinforcement learning

作者：

王成意、朱进、赵云波

展开 >

作者单位：

中国科学技术大学信息科学技术学院,安徽合肥 230026

关键词：

两人零和马尔科夫博弈非完全信息极大极小Q学习纳什均衡多智能体强化学习

出版年：

2024

DOI：

10.7641/CTA.2023.20630

控制理论与应用

华南理工大学中国科学院数学与系统科学研究院

控制理论与应用

CSTPCD北大核心

影响因子：1.076

ISSN：1000-8152

年,卷(期)：2024.41(11)