A Review of Probe Interpretable Methods in Natural Language Processing
The widespread adoption of large-scale pre-trained models in multiple fields,particu-larly in natural language processing such as text classification and machine translation,has paved the way for remarkable advancements.Nonetheless,due to the"black box"nature of pre-trained language models,the internal decision patterns and encoded knowledge information are consid-ered to be opaque.While advanced pre-trained language models such as ChatGPT and GPT-4 re-leased by OpenAI have achieved significant performance breakthroughs in various domains,they may not be appropriate for fields that place high importance on security and fairness.This is at-tributed to the difficulty in verifying if these models inherently encode the desired knowledge and language properties without entailing any internal biases or discrimination.In the pursuit of better understandability and transparency of pre-trained models,a new interpretable scheme known as the"probing task"has emerged in recent years.This task promises to enhance our understanding of the linguistic properties encoded in each layer of pre-trained models.It assimilates model out-puts from arbitrary positions as input,employing a probing model for training auxiliary linguistic tasks(e.g.,part-of-speech tagging,dependency parsing),and subsequently gauges the degree to which specific linguistic properties are encoded within the layer under analysis based on the auxil-iary model's performance on the test set.For example,existing studies have demonstrated that pre-trained models encode more lexical properties at lower layers and more semantic properties at higher layers by freezing the model parameters and training the probing task at different layers.However,due to the toxicity within pre-training data,there is a significant possibility that the parameters encode a substantial amount of harmful content.Our review begins with an introduc-tion to the basic framework of the probing task,where we delve into the definition of probing tasks and outline the basic workflow of carrying out such a task.Then we systematically summa-rize existing schemes for probing tasks in natural language processing,including the most com-monly used diagnostic classifiers and other probing methods derived from them(structural pro-bing,intervention-based probing and prompt-based probing)to provide readers with ideas for de-signing reasonable probing tasks.For diagnostic classifiers,we also focus on the selection of pro-bing model complexity and probing datasets to guide the design of more reliable probe experi-ments.After that,we describe how to interpret the experimental results of probing tasks from the perspective of comparisons and controls to illustrate the extent to which the probing position encodes properties of interest.Finally,as we come to the end of the review,we take stock of the main applications and discuss potential key research directions to be pursued.We further rumi-nate on the current issues and challenges that the field of probing tasks faces and needs to ad-dress.Undeniably,as a relatively novel area of research,extant probing methods remain insuffi-ciently mature,encompassing both theoretical shortcomings inherent in the design of probing tasks themselves and an inadequacy in exploring more intricate linguistic properties.This paper aspires to furnish readers with a comprehensive"diagnostic report"concerning ongoing probing task research,while advocating for increased scholarly investment in pertinent domains.
probing taskinterpretabilitynatural language processingpre-trained modeldeep learningartificial intelligence security