强化学习中动态ε的贪婪探索策略

扫码查看

原文链接

万方数据
维普

中文摘要：随着强化学习领域的成熟,ε-贪婪方法被广泛运用在强化学习中,例如深度Q网络.但是,对于ε-贪婪方法每次选择动作,它有一定概率选择非最优的动作,导致不断探索.在此背景下,提出了一种动态ε-贪婪方法(DEG)和Dueling Actor-Critic框架(ACDD),能够平衡强化学习中的探索和利用问题.DEG将状态输入到ACDD框架得到优势值来自动调整ε的值,从而保持探索和利用之间的更好平衡.该实验在多臂老虎机任务中对DEG进行测试,将累计平均奖励和最优动作选择率作为评估标准.与一些广泛使用的方法相比,DEG可以达到更高的平均累积奖励和最优动作选择率,并提高了性能.

外文标题：A dynamic ε-Greedy exploration strategy in reinforcement learning

外文摘要：As the field of RL matures,ε-greedy method is widely used in RL,such as deep Q network.However,ε-greedy has a certain probability to choose other actions that are not optimal when choosing ac-tions,leading to constant exploration.In this context,a Dynamic ε-greedy algorithm(DEG)and Dueling Actor-Critic framework(ACDD)are proposed,which are able to balance exploration and exploitation prob-lems in RL.DEG inputs the state into the ACDD framework to get the advantage value to automatically ad-just the value of ε,thus maintaining a better balance between exploration and exploitation.This experiment tests the performance of DEG in a Multi-Armed Bandit task,using the average cumulative reward and the optimal action selection rate as evaluation criteria.Compared with some widely used algorithms,DEG can achieve higher average cumulative reward and optimal action selection rate,and improve performance.

外文关键词：

deep reinforcement learningexploration and exploitationdynamic εDueling Actor-Critic frameworkMulti-Armed Bandit

作者：

孔燕、曹俊豪、杨智超、芮烨锋

展开 >

作者单位：

南京信息工程大学计算机学院,南京 210044

南京信息工程大学数字取证教育部工程研究中心,南京 210044

关键词：

深度强化学习探索和利用动态化ε Dueling Actor-Critic框架多臂老虎机

基金：

国家自然科学基金

项目编号：

61602254

出版年：

2024

DOI：

10.13274/j.cnki.hdzj.2024.07.010

信息技术

黑龙江省信息技术学会中国电子信息产业发展研究院　中国信息产业部电子信息中心

信息技术

CSTPCD

影响因子：0.413

ISSN：1009-2552

年,卷(期)：2024.(7)