大模型引导的高效强化学习方法

An efficient reinforcement learning method based on large language model

扫码查看

原文链接

维普
万方数据

中文摘要：深度强化学习作为支撑AlphaGo和ChatGPT等突破性工作的关键技术,已成为前沿科学的研究热点.在实际应用上,深度强化学习作为一种重要的智能决策技术,被广泛应用于视觉场景的避障、虚拟场景的优化生成、机器臂控制、数字化设计与制造、工业设计决策等多种规划决策任务.然而,深度强化学习在实际应用中面临样本效率低下的挑战,严重限制了其应用效果.为缓解这一问题,针对现有强化学习探索机制的不足,将大模型技术与多种主流探索技术相结合,提出了一种基于大模型引导的高效探索方法,以提升样本效率.通过利用大模型来指导深度强化学习智能体的探索行为,该方法在多个国际公认的测试环境中显示出显著的性能提升,不仅展示了大模型技术在深度强化学习探索问题中的潜力,也为实际应用中改善样本效率提供了新的解决思路.

外文摘要：Deep reinforcement learning,as a key technology supporting breakthrough works such as AlphaGo and ChatGPT,has become a research hotspot in frontier science.In practical applications,deep reinforcement learning,as an important intelligent decision-making technology,is widely used in a variety of planning and decision-making tasks,such as obstacle avoidance in visual scenes,optimal generation of virtual scenes,robotic arm control,digital design and manufacturing,and industrial design decision-making.However,deep reinforcement learning faces the challenge of low sample efficiency in practical applications,which greatly limits its application effectiveness.In order to improve the sample efficiency,this paper proposes an efficient exploration method based on large model guidance,which combines the large model with the mainstream exploration techniques.Specifically,we utilize the semantic extraction capability of a large language model to obtain semantic information of states,which is then used to guide the exploration behavior of agents.Then,we introduce the semantic information into the classical methods in single-policy exploration and population exploration,respectively.By using the large model to guide the exploration behavior of deep reinforcement learning agents,our method shows significant performance improvement in popular environments.This research not only demonstrates the potential of large model techniques in deep reinforcement learning exploration problems,but also provides a new idea to alleviate the low sample efficiency problem in practical applications.

外文关键词：

deep reinforcement learninglarge language modelefficient exploration

作者：

徐沛、黄凯奇

展开 >

作者单位：

中国科学院自动化研究所智能系统与工程研究中心,北京 100190

中国科学院脑科学与智能技术卓越创新中心,上海 200031

中国科学院大学人工智能学院,北京 100049

关键词：

深度强化学习大语言模型高效探索

出版年：

2024

DOI：

10.11996/JG.j.2095-302X.2024061165

图学学报

中国图学学会

图学学报

CSTPCD北大核心

影响因子：0.73

ISSN：2095-302X

年,卷(期)：2024.45(6)