大语言模型对齐研究综述

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：随着人工智能技术的飞速发展,大语言模型已在众多领域得到了广泛应用.然而,大语言模型可能会生成不准确、有误导性甚至有害的内容,这引发了人们对大语言模型可靠性的担忧,采用对齐技术来确保大语言模型的行为与人类价值观一致已经成为一个亟待解决的问题.对近年来大语言模型对齐技术的研究进展进行综述.介绍了常用的指令数据收集方法和人类偏好数据集,概述了监督调整和对齐调整的相关研究,讨论了模型评估常用的数据集和方法,总结并展望了未来的研究方向.

外文标题：Survey on large language models alignment research

外文摘要：With the rapid development of artificial intelligence technology,large language models have been widely applied in numerous fields.However,the potential of large language models to generate inaccurate,misleading,or even harmful contents has raised concerns about their reliability.Adopting alignment techniques to ensure the behav-ior of large language models is consistent with human values has become an urgent issue to address.Recent research progress on alignment techniques for large language models were surveyed.Common methods for collecting instruc-tion data and human preference datasets were introduced,research on supervised tuning and alignment adjustments was summarized,commonly used datasets and methods for model evaluation were discussed,and future research di-rections were concluded.

外文关键词：

large language modelalignment techniquetunereinforcement learning

作者：

刘昆麟、屈新纪、谭芳、康红辉、赵少伟、施嵘

展开 >

作者单位：

中兴通讯股份有限公司,广东深圳 518057

关键词：

大语言模型对齐技术调整强化学习

出版年：

2024

DOI：

10.11959/j.issn.1000-0801.2024151

电信科学

中国通信学会　人民邮电出版社

电信科学

CSTPCD北大核心

影响因子：0.902

ISSN：1000-0801

年,卷(期)：2024.40(6)