首页|自然语言处理中的探针可解释方法综述

自然语言处理中的探针可解释方法综述

扫码查看
随着大规模预训练模型的广泛应用,自然语言处理的多个领域(如文本分类和机器翻译)取得了长足的发展.然而,受限于预训练模型的"黑盒"特性,其内部的决策模式以及编码的知识信息被认为是不透明的.以OpenAI发布的ChatGPT和GPT-4为代表的先进预训练模型为例,它们在多个领域取得重大性能突破的同时,由于无法获知其内部是否真正编码了人们期望的知识或语言属性,以及是否潜藏一些不期望的歧视或偏见,因此仍然无法将其应用于重视安全性和公平性的领域.近年来,一种新颖的可解释性方法"探针任务"有望提升人们对预训练模型各层编码的语言属性的理解.探针任务通过在模型的某一区域训练辅助语言任务,来检验该区域是否编码了感兴趣的语言属性.例如,现有研究通过冻结模型参数并在不同层训练探针任务,已经证明预训练模型在低层编码了更多词性属性而在高层编码了更多语义属性,但由于预训练数据的毒性,很有可能在参数中编码了大量有害内容.该文首先介绍了探针任务的基本框架,包括任务的定义和基本流程;然后对自然语言处理中现有的探针任务方法进行了系统性的归纳与总结,包括最常用的诊断分类器以及由此衍生出的其他探针方法,为读者提供设计合理探针任务的思路;接着从对比和控制的角度介绍如何解释探针任务的实验结果,以说明探测位置编码感兴趣属性的程度;最后对探针任务的主要应用和未来的关键研究方向进行展望,并讨论了当前探针任务亟待解决的问题与挑战.
A Review of Probe Interpretable Methods in Natural Language Processing
The widespread adoption of large-scale pre-trained models in multiple fields,particu-larly in natural language processing such as text classification and machine translation,has paved the way for remarkable advancements.Nonetheless,due to the"black box"nature of pre-trained language models,the internal decision patterns and encoded knowledge information are consid-ered to be opaque.While advanced pre-trained language models such as ChatGPT and GPT-4 re-leased by OpenAI have achieved significant performance breakthroughs in various domains,they may not be appropriate for fields that place high importance on security and fairness.This is at-tributed to the difficulty in verifying if these models inherently encode the desired knowledge and language properties without entailing any internal biases or discrimination.In the pursuit of better understandability and transparency of pre-trained models,a new interpretable scheme known as the"probing task"has emerged in recent years.This task promises to enhance our understanding of the linguistic properties encoded in each layer of pre-trained models.It assimilates model out-puts from arbitrary positions as input,employing a probing model for training auxiliary linguistic tasks(e.g.,part-of-speech tagging,dependency parsing),and subsequently gauges the degree to which specific linguistic properties are encoded within the layer under analysis based on the auxil-iary model's performance on the test set.For example,existing studies have demonstrated that pre-trained models encode more lexical properties at lower layers and more semantic properties at higher layers by freezing the model parameters and training the probing task at different layers.However,due to the toxicity within pre-training data,there is a significant possibility that the parameters encode a substantial amount of harmful content.Our review begins with an introduc-tion to the basic framework of the probing task,where we delve into the definition of probing tasks and outline the basic workflow of carrying out such a task.Then we systematically summa-rize existing schemes for probing tasks in natural language processing,including the most com-monly used diagnostic classifiers and other probing methods derived from them(structural pro-bing,intervention-based probing and prompt-based probing)to provide readers with ideas for de-signing reasonable probing tasks.For diagnostic classifiers,we also focus on the selection of pro-bing model complexity and probing datasets to guide the design of more reliable probe experi-ments.After that,we describe how to interpret the experimental results of probing tasks from the perspective of comparisons and controls to illustrate the extent to which the probing position encodes properties of interest.Finally,as we come to the end of the review,we take stock of the main applications and discuss potential key research directions to be pursued.We further rumi-nate on the current issues and challenges that the field of probing tasks faces and needs to ad-dress.Undeniably,as a relatively novel area of research,extant probing methods remain insuffi-ciently mature,encompassing both theoretical shortcomings inherent in the design of probing tasks themselves and an inadequacy in exploring more intricate linguistic properties.This paper aspires to furnish readers with a comprehensive"diagnostic report"concerning ongoing probing task research,while advocating for increased scholarly investment in pertinent domains.

probing taskinterpretabilitynatural language processingpre-trained modeldeep learningartificial intelligence security

鞠天杰、刘功申、张倬胜、张茹

展开 >

上海交通大学网络空间安全学院 上海 200240

北京邮电大学网络空间安全学院 北京 100876

探针任务 可解释 自然语言处理 预训练模型 深度学习 人工智能安全

社会治理与智慧社会科技支撑重点专项国家自然科学基金联合重点项目科技创新2030——新一代人工智能重大项目上海市科技计划项目

2023YFC3303805U21B20202022ZD012030422511104400

2024

计算机学报
中国计算机学会 中国科学院计算技术研究所

计算机学报

CSTPCD北大核心
影响因子:3.18
ISSN:0254-4164
年,卷(期):2024.47(4)
  • 184