基于斯蒂芬森价值迭代的改进DDPG算法

Improved DDPG algorithm based on Steffensen value iteration

张秋娟 ¹宋文广 ¹李博文¹

扫码查看

作者信息

1. 长江大学计算机科学学院,湖北荆州 434023
折叠

摘要

针对DDPG算法的值函数迭代学习收敛缓慢以及经验利用率低的问题,提出一种基于Steffensen价值迭代和注意力经验回放的DDPG算法.将Steffensen迭代法应用于价值迭代过程,提高其收敛速度;采用基于注意力的经验回放机制,计算智能体当前所处状态与经验中状态的相似性,优先采样相似度高的经验,使智能体专注学习包含当前策略频繁访问的状态的经验.在PyBullet环境的6个连续动作控制任务中进行实验,其结果表明,相比DDPG算法、TD3算法、CDLR-DDPG算法和EER-DDPG算法,所提算法收敛更快,性能更好.

Abstract

To address the slow convergence of value function iteration learning and low experience utilization rate in deep deter-ministic policy gradient(DDPG)algorithm,a DDPG algorithm based on Steffensen value iteration and attentive experience replay was proposed.The Steffensen iteration method was applied to the value iteration process to improve convergence speed.The attentive experience replay mechanism was used to make the agent focus on learning experiences that contained states frequently visited by current policy.The experience replay mechanism was used to calculate the similarities between the agent's current state and the states in the experiences,preferentially sampling the experiences with high similarity.Results of experiments on six continuous-action control tasks in the PyBullet environment demonstrate that the proposed algorithm converges faster and per-forms better than DDPG algorithm,twin delayed deep deterministic policy gradient(TD3)algorithm,cycling decay learning rate-deep deterministic policy gradient(CDLR-DDPG)algorithm and deep deterministic policy gradient with episode experience replay(EER-DDPG)algorithm.

关键词

深度强化学习/深度确定性策略梯度/连续控制任务/价值迭代/经验回放/累积奖励/注意力经验回放

Key words

deep reinforcement learning/deep deterministic policy gradient/continuous control tasks/value iteration/experience replay/cumulative reward/attentive experience replay

引用本文复制引用

出版年

2024

计算机工程与设计

中国航天科工集团二院706所

计算机工程与设计

CSTPCD北大核心

影响因子：0.617

ISSN：1000-7024

段落导航