To address the slow convergence of value function iteration learning and low experience utilization rate in deep deter-ministic policy gradient(DDPG)algorithm,a DDPG algorithm based on Steffensen value iteration and attentive experience replay was proposed.The Steffensen iteration method was applied to the value iteration process to improve convergence speed.The attentive experience replay mechanism was used to make the agent focus on learning experiences that contained states frequently visited by current policy.The experience replay mechanism was used to calculate the similarities between the agent's current state and the states in the experiences,preferentially sampling the experiences with high similarity.Results of experiments on six continuous-action control tasks in the PyBullet environment demonstrate that the proposed algorithm converges faster and per-forms better than DDPG algorithm,twin delayed deep deterministic policy gradient(TD3)algorithm,cycling decay learning rate-deep deterministic policy gradient(CDLR-DDPG)algorithm and deep deterministic policy gradient with episode experience replay(EER-DDPG)algorithm.
关键词
深度强化学习/深度确定性策略梯度/连续控制任务/价值迭代/经验回放/累积奖励/注意力经验回放
Key words
deep reinforcement learning/deep deterministic policy gradient/continuous control tasks/value iteration/experience replay/cumulative reward/attentive experience replay