首页|基于DDQN改进方法的"斗地主"策略

基于DDQN改进方法的"斗地主"策略

扫码查看
基于当前一些已有方法在牌类博弈中训练时间长、动作空间大、胜率低等问题,提出了针对DDQN算法网络架构、编码方式的改进方法。采用二进制对手牌特征进行编码,采用手牌拆分的方法把神经网络分为主牌神经网络和副牌神经网络,并且增加GRU神经网络处理序列动作。经实验表明,该算法训练时间比传统DQN算法缩短了 13%,在"地主"和"农民"位置上的平均胜率为70%和75%,高于DQN算法的28%和60%,证明了改进算法在上述部分指标方面的优势。
DouDiZhu strategy based on DDQN improvement method
Based on the problems of some existing methods such as long training time,large action space and low success rate in card games,an improved method for the network architecture and encoded mode of DDQN algorithm is proposed.This method uses binary thought to encode the cards,divides the neural net-work into the main card neural network and the kicker card neural network based on the card splitting meth-od,and adds GRU neural network to process the sequence actions.The experiment shows that the training time of the algorithm is 13%shorter than that of the traditional DQN algorithm,and the average winning rate in the'landlord'and the'farmer'positions is 70%and 75%,higher than that of the DQN algorithm by 28%and 60%,which proves the advantages of the improved algorithm in some of the above indicators.

deep reinforcement learningDouble deep Q-learningcomputer gamesGate Recurrent Unit networklarge scale discrete action space

孔燕、吴晓聪、芮烨锋、史鸿远

展开 >

南京信息工程大学计算机学院,南京 210044

南京信息工程大学数字取证教育部工程研究中心,南京 210044

深度强化学习 Double deep Q-learning 计算机博弈 Gate Recurrent Unit神经网络 大规模离散动作空间

国家自然科学基金

61602254

2024

信息技术
黑龙江省信息技术学会 中国电子信息产业发展研究院 中国信息产业部电子信息中心

信息技术

CSTPCD
影响因子:0.413
ISSN:1009-2552
年,卷(期):2024.(5)