Prioritised experience replay based on sample optimisation

扫码查看

原文链接

NETL
NSTL
Wiley

外文摘要：The sample-based prioritised experience replay proposed in this study is aimed at how to select samples to the experience replay, which improves the training speed and increases the reward return. In the traditional deep Q-networks (DQNs), it is subjected to random pickup of samples into the experience replay. However, the effect of each sample is different for the training process of agent. A better sampling method will make the agent training more effective. Therefore, when selecting a sample to the experience replay, the authors first allow the agent to learn randomly through the sample optimisation network, and take the average value returned after each study, so that the mean value is used as a threshold for selecting samples to the experience replay. Second, on the basis of sample optimisation, the authors increase the priority update and use the idea of reward-shaping to give additional reward values to the returns of certain samples, which speeds up the agent training. Compared with traditional DQN and the prioritised experience replay DQN, this study uses OpenAI Gym as platform to improve agent learning efficiency.

外文关键词：

learning (artificial intelligence)sampling methodsneural netsoptimisationsample-based prioritised experience replaysampling methodagent trainingsample optimisation networkprioritised experience replayDQNdeep Q-networks

作者：

Wang, Xuesong、Xiang, Haopeng、Cheng, Yuhu、Yu, Qiang

展开 >

作者单位：

China Univ Min & Technol, Sch Informat & Control Engn, Xuzhou 221116, Jiangsu, Peoples R China|Xuzhou Key Lab Artificial Intelligence & Big Data, Xuzhou 221116, Jiangsu, Peoples R China

出版年：

2020

DOI：

10.1049/joe.2019.1204

The Journal of Engineering

ISSN：

年,卷(期)：2020.2020(13)

被引量2
参考文献量16