结合注意力机制与好奇心驱动的近端策略优化算法

扫码查看

原文链接

万方数据
维普

中文摘要：大多数真实世界的问题中外在世界的激励往往极其稀疏,Agent因得不到反馈而缺乏有效的机制更新策略函数.单纯利用内在好奇心机制驱动会受到无用或有害好奇心的影响导致探索任务失败.针对以上问题,提出一种结合注意力机制与好奇心驱动的近端策略优化算法,Agent能够通过好奇心驱动探索未知环境,同时结合注意力机制的理性好奇心能够有效控制Agent因有害好奇心导致的异常探索,使近端策略优化算法保持较快速度和更稳定的状态进行策略更新.实验结果表明该方法下Agent有更好的性能,能取得更高的平均奖励回报.

外文标题：PROXIMAL POLICY OPTIMIZATION ALGORITHM COMBINING WITH ATTENTION MECHANISM AND CURIOSITY-DRIVEN

外文摘要：In most problems of real world,incentives in the external world are often very sparse.The agent lacks an effective mechanism to update its policy function because of lack of feedback.Only using the intrinsic curiosity mechanism to drive the exploration of the task may lead to the failure of the exploration task due to the influence of useless or harmful curiosity.This paper proposes a proximal policy optimization algorithm combining with attention mechanism and curiosity-driven.The agent could be driven by curiosity to explore the unknown environment.Meanwhile,combining with the help of the rational curiosity of attention mechanism,the abnormal exploration resulted in harmful curiosity of the agent was effectively controlled,which made the proximal policy optimization algorithm keeping running faster and updating its policy in a more stable state.Experiments show that the agent has better performance and can obtain higher average reward in return.

外文关键词：

Deep reinforcement learningAttention mechanismProximal policy optimizationCuriosity mechanism

作者：

陈至栩、张荣芬、刘宇红、王子鹏、黄继辉

展开 >

作者单位：

贵州大学大数据与信息工程学院贵州贵阳 550025

关键词：

深度强化学习注意力机制近端策略优化好奇心机制

基金：

贵州省科技计划

项目编号：

黔科合平台人才[2016]5707

出版年：

2024

DOI：

10.3969/j.issn.1000-386x.2024.03.040

计算机应用与软件

上海市计算技术研究所上海计算机软件技术开发中心

计算机应用与软件

CSTPCD北大核心

影响因子：0.615

ISSN：1000-386X

年,卷(期)：2024.41(3)

参考文献量21