基于粒子群优化的德州扑克在线对手利用

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：德州扑克中,相比于采用均衡策略求解的方法,对手利用是针对存在弱点的对手以获取更大收益的更有效方法.然而在面对一个全新对手时,在线条件下如何高效利用对手仍然是一大难题.现有方法常采用离线训练在线适应的方式来避开这一问题,即利用学习、演化等方法,通过海量离线训练来获得具有对手适应性的模型,使其能在比赛中适应不同的对手,而不是在比赛中针对一个新对手在线主动地优化自身策略.对此,以在线主动策略优化实现有效对手利用为目的,基于时间维的粒子定义提出一种基于粒子群优化的策略优化方法,将在线策略优化的思路引入德州扑克这种具有强随机性的博弈问题中,开展对手利用并实现在线比赛收益最大化.针对适应度计算受随机运气影响以及部分对手针对性策略难以优化的问题,提出一种基于局部最优解替代、全局最优解替代的改进粒子群优化算法(BR-PSO).实验结果表明,对于标准PSO方法难以针对的对手,所提出的方法能有效获得对手的针对性策略以实现最大化对手利用,而且优化策略的收益能够媲美基于手牌预测AI的收益.

外文标题：Online opponent exploitation method based on particle swarm optimization for Texas Hold'em

外文摘要：In Texas Hold'em,opponent exploitation is the more effective method to obtain larger income from opponents with weakness in contrast to the Nash equilibrium searching method.However,how to effectively exploit the brand new opponent under the condition of online competitions is still a challenge.The existing methods usually use offline training and online adaptation to avoid this problem,that is,using like learning,evolution methods to obtain a model with opponent adaptability through massive offline training,so that it can adapt to different opponents in competitions,instead of actively optimizing its own policy for a new opponent in the online competition.For the purpose of online active policy optimizing to achieve effective opponent exploitation,a policy optimization method based on particle swarm optimization(PSO)is proposed to maximize the competition income,which introduces the idea of online optimization into Texas Hold'em regarded as an game problem with strong randomness.Aiming to the problems that fitness computation is affected by random luck and targeted policies for some opponents are hard to optimize with the standard PSO,a modified PSO method called BR-PSO(best replacement-PSO)is proposed based on local optimal solution replacement and global optimal solution replacement.The result of experiments indicates the proposed method can find targeted policies to maximize opponent exploitation of the opponents that are hard to counter with the standard PSO,and the income of the optimized policy is comparable to that of Al based on the hand prediction method.

外文关键词：

particle swarm optimizationpolicy optimizationoptimal solution replacementopponent exploitationonline competitionTexas Hold'em

作者：

胡振震、陈少飞、袁唯淋、李鹏、陈璟

展开 >

作者单位：

国防科技大学智能科学学院,长沙 410073

关键词：

粒子群优化策略优化最优解替代对手利用在线比赛德州扑克

基金：

国家自然科学基金国家自然科学基金

项目编号：

6180621262376280

出版年：

2024

DOI：

10.13195/j.kzyjc.2022.1790

控制与决策

东北大学

控制与决策

CSTPCD北大核心

影响因子：1.227

ISSN：1001-0920

年,卷(期)：2024.39(5)

被引量1
参考文献量29