图学学报2024,Vol.45Issue(6) :1165-1177.DOI:10.11996/JG.j.2095-302X.2024061165

大模型引导的高效强化学习方法

An efficient reinforcement learning method based on large language model

徐沛 黄凯奇
图学学报2024,Vol.45Issue(6) :1165-1177.DOI:10.11996/JG.j.2095-302X.2024061165

大模型引导的高效强化学习方法

An efficient reinforcement learning method based on large language model

徐沛 1黄凯奇2
扫码查看

作者信息

  • 1. 中国科学院自动化研究所智能系统与工程研究中心,北京 100190
  • 2. 中国科学院自动化研究所智能系统与工程研究中心,北京 100190;中国科学院脑科学与智能技术卓越创新中心,上海 200031;中国科学院大学人工智能学院,北京 100049
  • 折叠

摘要

深度强化学习作为支撑AlphaGo和ChatGPT等突破性工作的关键技术,已成为前沿科学的研究热点.在实际应用上,深度强化学习作为一种重要的智能决策技术,被广泛应用于视觉场景的避障、虚拟场景的优化生成、机器臂控制、数字化设计与制造、工业设计决策等多种规划决策任务.然而,深度强化学习在实际应用中面临样本效率低下的挑战,严重限制了其应用效果.为缓解这一问题,针对现有强化学习探索机制的不足,将大模型技术与多种主流探索技术相结合,提出了一种基于大模型引导的高效探索方法,以提升样本效率.通过利用大模型来指导深度强化学习智能体的探索行为,该方法在多个国际公认的测试环境中显示出显著的性能提升,不仅展示了大模型技术在深度强化学习探索问题中的潜力,也为实际应用中改善样本效率提供了新的解决思路.

Abstract

Deep reinforcement learning,as a key technology supporting breakthrough works such as AlphaGo and ChatGPT,has become a research hotspot in frontier science.In practical applications,deep reinforcement learning,as an important intelligent decision-making technology,is widely used in a variety of planning and decision-making tasks,such as obstacle avoidance in visual scenes,optimal generation of virtual scenes,robotic arm control,digital design and manufacturing,and industrial design decision-making.However,deep reinforcement learning faces the challenge of low sample efficiency in practical applications,which greatly limits its application effectiveness.In order to improve the sample efficiency,this paper proposes an efficient exploration method based on large model guidance,which combines the large model with the mainstream exploration techniques.Specifically,we utilize the semantic extraction capability of a large language model to obtain semantic information of states,which is then used to guide the exploration behavior of agents.Then,we introduce the semantic information into the classical methods in single-policy exploration and population exploration,respectively.By using the large model to guide the exploration behavior of deep reinforcement learning agents,our method shows significant performance improvement in popular environments.This research not only demonstrates the potential of large model techniques in deep reinforcement learning exploration problems,but also provides a new idea to alleviate the low sample efficiency problem in practical applications.

关键词

深度强化学习/大语言模型/高效探索

Key words

deep reinforcement learning/large language model/efficient exploration

引用本文复制引用

出版年

2024
图学学报
中国图学学会

图学学报

CSTPCDCSCD北大核心
影响因子:0.73
ISSN:2095-302X
段落导航相关论文