首页|Prioritised experience replay based on sample optimisation

Prioritised experience replay based on sample optimisation

扫码查看
The sample-based prioritised experience replay proposed in this study is aimed at how to select samples to the experience replay, which improves the training speed and increases the reward return. In the traditional deep Q-networks (DQNs), it is subjected to random pickup of samples into the experience replay. However, the effect of each sample is different for the training process of agent. A better sampling method will make the agent training more effective. Therefore, when selecting a sample to the experience replay, the authors first allow the agent to learn randomly through the sample optimisation network, and take the average value returned after each study, so that the mean value is used as a threshold for selecting samples to the experience replay. Second, on the basis of sample optimisation, the authors increase the priority update and use the idea of reward-shaping to give additional reward values to the returns of certain samples, which speeds up the agent training. Compared with traditional DQN and the prioritised experience replay DQN, this study uses OpenAI Gym as platform to improve agent learning efficiency.

learning (artificial intelligence)sampling methodsneural netsoptimisationsample-based prioritised experience replaysampling methodagent trainingsample optimisation networkprioritised experience replayDQNdeep Q-networks

Wang, Xuesong、Xiang, Haopeng、Cheng, Yuhu、Yu, Qiang

展开 >

China Univ Min & Technol, Sch Informat & Control Engn, Xuzhou 221116, Jiangsu, Peoples R China|Xuzhou Key Lab Artificial Intelligence & Big Data, Xuzhou 221116, Jiangsu, Peoples R China

2020

The Journal of Engineering

The Journal of Engineering

ISSN:
年,卷(期):2020.2020(13)
  • 2
  • 16