首页|结合注意力机制与好奇心驱动的近端策略优化算法

结合注意力机制与好奇心驱动的近端策略优化算法

扫码查看
大多数真实世界的问题中外在世界的激励往往极其稀疏,Agent因得不到反馈而缺乏有效的机制更新策略函数。单纯利用内在好奇心机制驱动会受到无用或有害好奇心的影响导致探索任务失败。针对以上问题,提出一种结合注意力机制与好奇心驱动的近端策略优化算法,Agent能够通过好奇心驱动探索未知环境,同时结合注意力机制的理性好奇心能够有效控制Agent因有害好奇心导致的异常探索,使近端策略优化算法保持较快速度和更稳定的状态进行策略更新。实验结果表明该方法下Agent有更好的性能,能取得更高的平均奖励回报。
PROXIMAL POLICY OPTIMIZATION ALGORITHM COMBINING WITH ATTENTION MECHANISM AND CURIOSITY-DRIVEN
In most problems of real world,incentives in the external world are often very sparse.The agent lacks an effective mechanism to update its policy function because of lack of feedback.Only using the intrinsic curiosity mechanism to drive the exploration of the task may lead to the failure of the exploration task due to the influence of useless or harmful curiosity.This paper proposes a proximal policy optimization algorithm combining with attention mechanism and curiosity-driven.The agent could be driven by curiosity to explore the unknown environment.Meanwhile,combining with the help of the rational curiosity of attention mechanism,the abnormal exploration resulted in harmful curiosity of the agent was effectively controlled,which made the proximal policy optimization algorithm keeping running faster and updating its policy in a more stable state.Experiments show that the agent has better performance and can obtain higher average reward in return.

Deep reinforcement learningAttention mechanismProximal policy optimizationCuriosity mechanism

陈至栩、张荣芬、刘宇红、王子鹏、黄继辉

展开 >

贵州大学大数据与信息工程学院 贵州贵阳 550025

深度强化学习 注意力机制 近端策略优化 好奇心机制

贵州省科技计划

黔科合平台人才[2016]5707

2024

计算机应用与软件
上海市计算技术研究所 上海计算机软件技术开发中心

计算机应用与软件

CSTPCD北大核心
影响因子:0.615
ISSN:1000-386X
年,卷(期):2024.41(3)
  • 21