结合A2C和手牌估值方法的麻将博弈研究

扫码查看

原文链接

国家科技期刊平台
NETL
NSTL
万方数据

中文摘要：针对大众麻将中对手牌信息利用不充分的问题,提出了手牌估值方法,并设计了基础麻将程序(MJE).为进一步提升麻将AI的博弈能力,使用深度强化学习方法设计了麻将AI(MJE-RL).首先,通过MJE自对弈生成深度学习的训练数据.其次,根据训练集、测试集和对比实验的结果,选择效果最好的模型作为强化学习的预训练模型.最后,使用优势演说-评论家模型作为强化学习的主要框架,将训练好的深度学习模型作为演说家部分进行决策,通过MJE-RL与MJE的对弈不断提升麻将AI的博弈能力.实验结果显示,MJE-RL的胜率比MJE高4.08％,点炮率比MJE低3.02％,表明MJE-RL在攻守两端都有提升,达到了提升麻将AI牌力的目的.

外文标题：Research on mahjong game combining A2C with hand value evaluation method

外文摘要：To address the underutilizing hand information in popular mahjong, this paper designs a hand valuation method and a basic mahjong program ( MJE) .Mahjong AI ( MJE-RL) is designed by using the deep reinforcement learning approach to further improve its gaming ability.First, the training data of deep learning is generated by MJE' s self-play.Second, the best model is selected as the pre-training model of reinforcement learning, according to the results of training set, test set and comparison experiment.Finally, the Advantage Actor-Critic ( A2C) model is employed as the main framework of reinforcement learning.The well-trained deep learning model is used as the Actor to make decisions, and the game ability of mahjong AI is constantly improved by playing between MJE-RL and MJE.Our experimental results indicate the winning rate of MJE-RL is 4 .08％ higher than that of MJE and the rate of Win by Discard is 3.02％ lower than that of MJE.Meanwhile, it is shown that MJE-RL markedly improves both offensive and defensive fronts, demonstrating improved overall strength of mahjong AI.

外文关键词：

popular mahjongincomplete informationdeep reinforcement learningA2C

作者：

衣御寒、王亚杰、吴燕燕、刘松、张兴慧、蒋传禹

展开 >

作者单位：

沈阳航空航天大学工程训练中心, 沈阳 110136

关键词：

麻将非完备信息深度强化学习 A2C

基金：

辽宁省兴辽英才计划项目

项目编号：

XLYC1906003

出版年：

2024

DOI：

10.3969/j.issn.1674-8425(z).2024.05.020

重庆理工大学学报

重庆理工大学

重庆理工大学学报

CSTPCD北大核心

影响因子：0.567

ISSN：1674-8425

年,卷(期)：2024.38(9)

参考文献量9