非完美信息博弈综述:对抗求解方法与对比分析

扫码查看

原文链接

万方数据
维普

中文摘要：当前,人工智能成为经济发展的新引擎,是新一轮产业变革的核心驱动力.结合人工智能与博弈论形成的新兴研究领域"博弈智能"吸引了越来越多学者的研究兴趣,并在现实生活中得到了广泛应用.作为一类典型的博弈智能,非完美信息博弈通过建模多智能体在私有信息下的博弈行为,能够刻画相较完美信息博弈更广泛的决策过程,在现实世界中具有广泛应用,例如金融贸易、商业谈判、军事对抗等.近年来,非完美信息博弈求解研究取得了突破性进展,涌现出以遗憾最小化(Regret Minimization)和最佳响应(Best Response)为核心技术的两大类离线求解方法.前者通过反省智能体过往决策以使自身策略向均衡点改进,成功解决了以德州扑克为代表的经典非完美信息博弈.后者通过特定应对方式针对对手决策以使自身策略向均衡点改进,在例如星际争霸、DOTA等大型实时战略游戏AI训练中发挥着关键作用.此外,一系列在线求解方法能够进一步实时优化离线算法求解所得的蓝图策略,使其在实时对局中得到进一步改进,成为求解非完美信息博弈的关键技术.本文将从非完美信息博弈的概念和特点切入,全面介绍这三类方法的基本原理、发展脉络和改进技巧,深入对比不同方法间的优缺点并展望未来研究方向.希望通过对非完美信息博弈求解这一研究领域的全方位细致梳理,能够进一步推动博弈智能技术向前发展,为迈向通用人工智能赋能.

外文标题：A Review of Imperfect Information Games:Adversarial Solving Methods and Comparative Analysis

外文摘要：Artificial Intelligence(AI)has emerged as a pivotal force in the latest industrial revo-lution and has become a national strategic priority.The fusion of AI and game theory has given rise to"Game Intelligence"as a leading research domain.Among the diverse facets of game intel-ligence,Imperfect-Information Games(IIGs)stand out for their ability to simulate the strategic decision-making of multiple agents amidst private information an accurate portrayal of many real-world scenarios.Compared to perfect-information games,IIGs offer a more nuanced understand-ing of decision-making processes,making them applicable across various real-world domains such as financial trading,business negotiations,and military operations.Recent strides in 1IG research have led to the emergence of two primary streams of offline solving methods:Regret Minimiza-tion and Best Response.Regret Minimization continually refines its strategy towards equilibrium by learning from past decisions,making it particularly advantageous in scenarios with unknown or uncertain opponent strategies.On the other hand,Best Response fine-tunes its strategy to-wards equilibrium by devising tailored countermeasures against opponents'decisions,proving pivotal in training AI for large-scale real-time strategy games like Starcraft and DOT A.The effi-cacy of the Best Response approach hinges on its ability to anticipate and counteract opponents'moves.Moreover,search-based online solving methods optimize blueprint strategies in real-time,facilitating precise Nash equilibrium solutions,constituting a critical technology in IIG sol-ving.The synergy of offline and online solving methods equips AI with the capability to navigate the intricacies of IIGs and attain optimal solutions.This survey aims to provide a comprehensive exploration of the realm of IIGs.Beginning with an elucidation of IIGs'concept and their distin-guishing features,the survey offers an overview of the methods employed for their resolution.Subsequently,it delves into the fundamental principles and historical context of these methods,alongside delineating advanced techniques to enhance their efficacy.Additionally,the survey con-ducts an exhaustive comparison of the strengths and weaknesses of various methods,while provi-ding insights into future research trajectories.It is our aspiration that through this comprehensive scrutiny of IIGs,this survey will drive advancements in game intelligence technology and contrib-ute to the development of artificial intelligence.

外文关键词：

imperfect information gameregret minimizationbest responsesafe searchrein-forcement learning

作者：

余超、刘宗凯、胡超豪、黄凯奇、张俊格

展开 >

作者单位：

中山大学计算机学院广州 510006

中国科学院自动化研究所智能系统与工程研究中心北京 100190

关键词：

非完美信息博弈遗憾最小化最佳响应在线求解强化学习

基金：

国家自然科学基金面上项目广东省自然科学基金中国科学院基础培育基金项目中山大学中央高校基本科研业务费专项资金中国科学院青年促进会项目资助

项目编号：

620762592023A1515012946JCPYJJ-22017

出版年：

2024

DOI：

10.11897/SP.J.1016.2024.02211

计算机学报

中国计算机学会中国科学院计算技术研究所

计算机学报

CSTPCD北大核心

影响因子：3.18

ISSN：0254-4164

年,卷(期)：2024.47(9)

参考文献量6