基于深度强化学习的双置换表优化算法研究

扫码查看

原文链接

国家科技期刊平台
NETL
NSTL
万方数据

中文摘要：基于深度强化学习的计算机博弈程序(如AlphaGo)已在围棋上战胜了人类世界冠军.这些算法利用可学习的价值神经网络和策略神经网络指导蒙特卡洛树的探索.为提高蒙特卡洛树的搜索性能,已提出多种改进方法,其中置换表已被证明可提高搜索效率.在此基础上,提出一种新的基于置换表的方法——基于深度强化学习的双置换表优化算法.该方法使用不同的替换策略管理双层置换表,并将六子棋的两步落子解耦为2个独立的神经网络.这不仅减小了动作空间规模,也更易于神经网络训练.以六子棋为例进行的实验结果表明,在有限的计算资源下,该方法能显著提升棋局哈希命中率和程序棋力水平.

外文标题：Two-level transposition table optimization algorithm based on deep reinforcement learning

外文摘要：Computer game programs based on deep reinforcement learning, such as AlphaGo, have beaten human world champions in the game of Go.These algorithms utilize learnable value neural networks and policy neural networks to guide the exploration process of Monte Carlo Tree Search.Various enhancement methods have been proposed to improve the search performance of Monte Carlo trees, among which the transposition table has been proven to enhance search efficiency.Building upon this foundation, this paper introduces a novel method, the two-level transposition table optimization algorithm based on deep reinforcement learning.This method manages two level transposition tables using distinct replacement strategies and decouples the two-step moves of Connect6 into two independent neural networks.This not only reduces the scale of the action space but also simplifies neural network training.Our experimental results using Connect6 as an example demonstrate this approach significantly enhances the program ' s playing strength under limited computational resources.

外文关键词：

deep reinforcement learningtransposition tablecomputer gameAlphaGoMCTS

作者：

王栋年、王军伟、薛世超、汪超、徐长明

展开 >

作者单位：

东北大学研究生院, 河北秦皇岛 066004

东北大学秦皇岛分校计算机与通信工程学院, 河北秦皇岛 066004

关键词：

深度强化学习置换表计算机博弈 AlphaGo 蒙特卡洛树

基金：

河北省自然科学基金面上项目

项目编号：

F2023501006

出版年：

2024

DOI：

10.3969/j.issn.1674-8425(z).2024.05.019

重庆理工大学学报

重庆理工大学

重庆理工大学学报

CSTPCD北大核心

影响因子：0.567

ISSN：1674-8425

年,卷(期)：2024.38(9)

参考文献量5