基于相似度加权的无模型元强化学习方法

扫码查看

原文链接

国家科技期刊平台
NETL
NSTL
万方数据

中文摘要：强化学习在游戏对弈、机器人控制等领域内已取得良好成效.为进一步提高训练效率,将元学习拓展至强化学习中,由此所产生的元强化学习已成为当前强化学习领域中的研究热点.元知识质量是决定元强化学习效果的关键因素,基于梯度的元强化学习以模型初始参数为元知识指导后续学习.为提高元知识质量,提出了一种通用元强化学习方法,通过加权机制显式表现训练过程中子任务对训练效果的贡献.该方法利用不同子任务所得的梯度更新向量与任务集内所有梯度更新向量的相似性作为更新权重,完善梯度更新过程,提高以模型初始参数为元知识的质量,使训练好的模型在一个良好的起点上解决新任务.该方法可通用在基于梯度的强化学习中,达到使用少量样本快速解决新任务的目标.在二维导航任务和仿真机器人运动控制任务的对比实验中,该方法优于其他基准算法,证明了加权机制的合理性.

外文标题：Model-agnostic Meta Reinforcement Learning Based on Similarity Weighting

外文摘要：Reinforcement learning has achieved excellent performance in the fields of game games and robotics control.In order to further improve the training efficiency,meta-learning is extended to reinforcement learning,the resulting meta-reinforcement learning has become a research hotspot in the field of reinforcement learning.The quality of meta-knowledge is the key factor determining the effect of meta-reinforcement learning,and gradient-based meta-reinforcement learning takes the initial parameters of the model as meta-knowledge to guide the subsequent learning.To improve the quality of meta-knowledge,we propose a general meta-reinforcement learning method,which explicitly shows the contribution of subtasks to the training effect in the training process by weighting.The proposed method uses the similarity between the gradient update vectors obtained by different subtasks and the gradient update vectors obtained by the overall task set as update weights,improves the gradient update process,improves the quality of the meta-knowledge based on the initial parameters of the model,and makes the trained model solve the new task at a good starting point.The proposed method can be used in gradient-based reinforcement learning to quickly solve new tasks with a small number of samples.In the experiments of 2D navigation tasks and locomotion tasks,the proposed method outperforms other benchmark algorithms,which proves the rationality of weighted mechanism.

外文关键词：

meta-learningreinforcement learningmeta-reinforcement learninggradient descentmodel agnostic

作者：

赵春宇、赖俊、陈希亮、张人文

展开 >

作者单位：

陆军工程大学指挥控制工程学院,江苏南京 210007

关键词：

元学习强化学习元强化学习梯度下降无模型

基金：

国家自然科学基金项目

项目编号：

61806221

出版年：

2024

DOI：

10.20165/j.cnki.ISSN1673-629X.2024.0051

计算机技术与发展

陕西省计算机学会

计算机技术与发展

CSTPCD

影响因子：0.621

ISSN：1673-629X

年,卷(期)：2024.34(5)

参考文献量32