控制理论与应用2024,Vol.41Issue(11) :2131-2138.DOI:10.7641/CTA.2023.20630

对手类型未知情况下的两人零和马尔科夫博弈决策

Decision making for two-player zero-sum Markov games with indistinguishable opponents

王成意 朱进 赵云波
控制理论与应用2024,Vol.41Issue(11) :2131-2138.DOI:10.7641/CTA.2023.20630

对手类型未知情况下的两人零和马尔科夫博弈决策

Decision making for two-player zero-sum Markov games with indistinguishable opponents

王成意 1朱进 1赵云波1
扫码查看

作者信息

  • 1. 中国科学技术大学信息科学技术学院,安徽 合肥 230026
  • 折叠

摘要

本文研究一类典型的非完全信息博弈问题——对手类型未知的两人零和马尔科夫博弈,其中对手类型多样且每次博弈开始前无法得知对手类型.文中提出了一种基于模型的多智能体强化学习算法——对手辨识的极大极小Q学习(DOMQ).该算法首先建立对手相关环境的经验模型,再使用经验模型学习纳什均衡策略,己方智能体在实际博弈中根据经验模型判断对手类型,从而使用相应的纳什均衡策略,以保证收益下限.本文所提的DOMQ算法只需要在采样阶段的每轮博弈结束后得知对手的类型,除此之外无需知道任何环境的信息.仿真实验验证了所提算法的有效性.

Abstract

This paper investigates a typical class of incomplete information games-two-player zero-sum Markov games with indistinguishable opponents,where the opponent types are diverse and cannot be known at the beginning of the game.We propose a model-based multi-agent reinforcement learning algorithm-distinguishing opponent minimax Q-learning(DOMQ).The algorithm firstly builds an empirical model of the opponent-related environment;secondly uses the empirical model to learn a Nash equilibrium strategy,and then uses the corresponding Nash equilibrium strategy to guarantee the lower bound of the return in actual game.All the necessary information needed for the proposed DOMQ algorithm is the opponent type at the end of each episode in the sampling period rather than the other information about the environment.The simulation results verify the effectiveness of the proposed algorithm.

关键词

两人零和马尔科夫博弈/非完全信息/极大极小Q学习/纳什均衡/多智能体强化学习

Key words

two-player zero-sum Markov game/incomplete information/minimax Q-learning/Nash equilibrium/multi-agent reinforcement learning

引用本文复制引用

出版年

2024
控制理论与应用
华南理工大学 中国科学院数学与系统科学研究院

控制理论与应用

CSTPCDCSCD北大核心
影响因子:1.076
ISSN:1000-8152
段落导航相关论文