首页|大模型时代的信任重塑:小模型实现超级对齐的机制与模式

大模型时代的信任重塑:小模型实现超级对齐的机制与模式

扫码查看
针对大模型运行中的内容信任危机、价值信任危机和模型信任危机,"对齐"被视为解决危机的可行路径,但以人类反馈为主的"对齐"工作难以应对可能出现的人类智力范畴之外的"超人模型","以小督大""以弱应强"式的"超级对齐"则有助于增加"超人模型"的无害性与可信任度.在"超级对齐"中,小模型成为重塑信任的关键角色.具体而言,场景小模型作为"有能力"的小模型,对齐垂直领域而缓解内容信任危机;私域小模型打造"可信赖的"小模型形象,实现实时对齐;"可依靠的"边缘小模型对齐边缘价值并维稳对齐环境;"可连接的"大小模型联动,在辩论中达到模型间对齐.未来小模型重塑信任的进路将从个性化、透明性与可赋权入手,情感信任、技术信任与权力信任齐发力.
Trust Reshaping in the Era of Big Models:Mechanisms and Patterns for Achieving Super-Alignment with Small Models
In response to the crises of content trust,value trust,and model trust that arise in the operation of big mod-els,"alignment"has been proposed as a viable solution.However,the process of"alignment"that relies on human feedback faces significant challenges when it comes to the potential emergence of"superhuman models"—those that surpass human in-telligence.To address this,"super-alignments"that embody a"small against big"and"weak against strong"dynamic are crucial.These super-alignments enhance the harmlessness and trustworthiness of"superhuman models".We believe that in the realm of"super alignment",small models play a pivotal role in redefining trust.Specifically,small models in specific do-mains act as"capable"entities that align with vertical domains to mitigate the content trust crisis.Private-domain small mod-els cultivate an image of"trustworthiness",enabling real-time alignment.Meanwhile,"dependable"edge small models har-monize edge values and stabilize the alignment environment."Connectable"large and small models are interconnected to a-chieve inter-model alignment,particularly in contentious debates.Looking ahead,the path for small models to reshape trust should begin with a focus on personalization,transparency,and empowerment,with emotional trust,technological trust,and power trust all exerting their influence together to realize it.

big modelsuper-alignmentsmall modeltrust crisistrust reshaping

喻国明、卞中明

展开 >

北京师范大学新闻传播学院(北京100091)

北京师范大学新闻传播学院传播创新与未来媒体实验平台(北京100091)

大模型 超级对齐 小模型 信任危机 信任重塑

2024

湖南师范大学社会科学学报
湖南师范大学

湖南师范大学社会科学学报

CSTPCDCSSCICHSSCD北大核心
影响因子:1.06
ISSN:1000-2529
年,卷(期):2024.53(3)