计算机应用与软件2024,Vol.41Issue(3) :258-265,275.DOI:10.3969/j.issn.1000-386x.2024.03.040

结合注意力机制与好奇心驱动的近端策略优化算法

PROXIMAL POLICY OPTIMIZATION ALGORITHM COMBINING WITH ATTENTION MECHANISM AND CURIOSITY-DRIVEN

陈至栩 张荣芬 刘宇红 王子鹏 黄继辉
计算机应用与软件2024,Vol.41Issue(3) :258-265,275.DOI:10.3969/j.issn.1000-386x.2024.03.040

结合注意力机制与好奇心驱动的近端策略优化算法

PROXIMAL POLICY OPTIMIZATION ALGORITHM COMBINING WITH ATTENTION MECHANISM AND CURIOSITY-DRIVEN

陈至栩 1张荣芬 1刘宇红 1王子鹏 1黄继辉1
扫码查看

作者信息

  • 1. 贵州大学大数据与信息工程学院 贵州贵阳 550025
  • 折叠

摘要

大多数真实世界的问题中外在世界的激励往往极其稀疏,Agent因得不到反馈而缺乏有效的机制更新策略函数.单纯利用内在好奇心机制驱动会受到无用或有害好奇心的影响导致探索任务失败.针对以上问题,提出一种结合注意力机制与好奇心驱动的近端策略优化算法,Agent能够通过好奇心驱动探索未知环境,同时结合注意力机制的理性好奇心能够有效控制Agent因有害好奇心导致的异常探索,使近端策略优化算法保持较快速度和更稳定的状态进行策略更新.实验结果表明该方法下Agent有更好的性能,能取得更高的平均奖励回报.

Abstract

In most problems of real world,incentives in the external world are often very sparse.The agent lacks an effective mechanism to update its policy function because of lack of feedback.Only using the intrinsic curiosity mechanism to drive the exploration of the task may lead to the failure of the exploration task due to the influence of useless or harmful curiosity.This paper proposes a proximal policy optimization algorithm combining with attention mechanism and curiosity-driven.The agent could be driven by curiosity to explore the unknown environment.Meanwhile,combining with the help of the rational curiosity of attention mechanism,the abnormal exploration resulted in harmful curiosity of the agent was effectively controlled,which made the proximal policy optimization algorithm keeping running faster and updating its policy in a more stable state.Experiments show that the agent has better performance and can obtain higher average reward in return.

关键词

深度强化学习/注意力机制/近端策略优化/好奇心机制

Key words

Deep reinforcement learning/Attention mechanism/Proximal policy optimization/Curiosity mechanism

引用本文复制引用

基金项目

贵州省科技计划(黔科合平台人才[2016]5707)

出版年

2024
计算机应用与软件
上海市计算技术研究所 上海计算机软件技术开发中心

计算机应用与软件

CSTPCD北大核心
影响因子:0.615
ISSN:1000-386X
参考文献量21
段落导航相关论文