首页|基于序列建模的生成式强化学习研究综述

基于序列建模的生成式强化学习研究综述

扫码查看
强化学习是机器学习中关于如何学习决策的分支,是一个序列决策问题,通过与环境反复交互试错找到最优策略.强化学习可以与生成模型结合使用来优化其性能,通常用于微调生成模型,提高其创建高质量内容的能力.强化学习过程也可以视为一个通用的序列建模问题,对任务轨迹上的分布进行建模,通过预训练生成模型产生一系列动作来获取一系列的高回报.在对输入信息进行建模的基础上,生成式强化学习能够更好地处理不确定性和未知的环境,更高效地将序列数据转换成用于决策的策略.首先针对强化学习算法和序列建模方法进行了介绍,对数据序列的建模过程进行了分析,然后按神经网络模型的类型进行分类探讨了强化学习的发展现状,在此基础上梳理了与生成模型结合的相关方法,并分析了强化学习方法在生成式预训练模型中的应用,最后总结了相关技术在理论和应用上的发展状况.
Review of Generative Reinforcement Learning Based on Sequence Modeling
Reinforcement learning is a branch of machine learning on how to learn decisions,which is a sequential decision-making problem that involves repeatedly interacting with the environment to find the optimal strategy through trial and error.Reinforce-ment learning can be combined with generative models to optimize their performance,and is typically used to fine-tune generative models and improve their ability to create high-quality content.The reinforcement learning process can also be seen as a general sequence modeling problem,modeling the distribution on task trajectories,and generating a series of actions through pre-training to obtain a series of high returns.Based on modeling input information,generative reinforcement learning can better handle uncer-tain and unknown environments,and more efficiently transform sequence data into strategies for decision-making.Firstly,an in-troduction is given to reinforcement learning algorithms and sequence modeling methods,and the modeling process of data se-quences is analyzed.The development status of reinforcement learning is discussed according to different neural network models used.Based on this,relevant methods combined with generative models are summarized,and the application of reinforcement learning methods in generative pre-training models is analyzed.Finally,the development status of relevant technologies in theory and application is summarized.

Artificial intelligenceReinforcement learningNeural networkGenerative modelSequence modeling

姚天磊、陈希亮、余沛毅

展开 >

陆军工程大学指挥控制工程学院 南京 210007

人工智能 强化学习 神经网络 生成模型 序列建模

国家自然科学基金

62273356

2024

计算机科学
重庆西南信息有限公司(原科技部西南信息中心)

计算机科学

CSTPCD北大核心
影响因子:0.944
ISSN:1002-137X
年,卷(期):2024.51(11)