A dynamic ε-Greedy exploration strategy in reinforcement learning
As the field of RL matures,ε-greedy method is widely used in RL,such as deep Q network.However,ε-greedy has a certain probability to choose other actions that are not optimal when choosing ac-tions,leading to constant exploration.In this context,a Dynamic ε-greedy algorithm(DEG)and Dueling Actor-Critic framework(ACDD)are proposed,which are able to balance exploration and exploitation prob-lems in RL.DEG inputs the state into the ACDD framework to get the advantage value to automatically ad-just the value of ε,thus maintaining a better balance between exploration and exploitation.This experiment tests the performance of DEG in a Multi-Armed Bandit task,using the average cumulative reward and the optimal action selection rate as evaluation criteria.Compared with some widely used algorithms,DEG can achieve higher average cumulative reward and optimal action selection rate,and improve performance.
deep reinforcement learningexploration and exploitationdynamic εDueling Actor-Critic frameworkMulti-Armed Bandit