基于汉语孤岛现象的大语言模型语言能力评估

Evaluating the Linguistic Competence of Large Language Models:An Experimental Study of Island Effects in Chinese

扫码查看

原文链接

维普
万方数据

中文摘要：基于概率来预测下一词的生成式大语言模型因其在语言生成方面的出色性能,使得不少人认为这种模型本质上就是人类语言官能的模型.然而,基于语法规则研究语言的学者则认为,大语言模型并不能洞察人类语言官能的本质.研究以汉语孤岛效应为切入点,评估了以GPT-2和Gemma 2为代表的大语言模型对这一语言现象的理解能力.我们设计了两组实验.实验一通过量化孤岛效应的影响因子,测试了汉语母语者对孤岛效应的认知.结果表明,母语者能够清晰地察觉出汉语中的孤岛效应,准确判别出汉语不合语法的句式.实验二通过计算最小对比对中违反孤岛和遵守孤岛句子的概率,测试了以上两种模型对孤岛效应的反应.结果表明,GPT-2和Gemma 2虽然可以甄别出两类句子在概率分布上存在差异;但其并不理解概率分布较低,即违反孤岛限制的句子,实际上是母语者根本不会产出的句子.对比以上两组实验,我们认为尽管大语言模型在语言生成方面表现出色,但在理解和判断特定语言现象方面仍然存在局限性,尚未达到人类语言能力水平.

外文摘要：Probability-based next token prediction language models have gained traction due to their impressive performance in generating synthetic languages,leading academics to believe that they embody the essence of the human language faculty.However,linguists adopting the grammar-based model of the faculty of language argue that Large Language Models(LLMs)offer limited insights into the nature of human languages.In this study,we seek to examine the linguistic understanding competence of GPT-2 and Gemma 2 by evaluating its performance on island effects in Chinese.We conducted two experiments.Experiment 1,deploying the factorial design of island effects with the tools of quantification,tested island effects by Chinese native speakers.The results showed that native speakers are able to clearly perceive island effects in Chinese,and accurately identify sentence structures in Chinese that are not grammatically correct.Experiment 2 examined the response of these two models to island effects by calculating the probabilities of sentences that violate and adhere to island constraints in minimal pairs.The results indicated that GPT-2 and Gemma 2 can differentiate the probability distributions of these two types of sentences,but they fail to understand that sentences with lower probability distributions,violating island constraints,are in fact sentences that native speakers would never produce.Comparing the results of the two experiments,we conclude that while LLMs excel in language generation,they still exhibit limitations in understanding and judging specific linguistic phenomena,and have not yet reached human-level linguistic competence.

外文关键词：

Large Language Modelsisland effectsexperimental syntaxsurprisal

作者：

陈旭、司富珍

展开 >

作者单位：

曲阜师范大学外国语学院,山东曲阜 273165

北京语言大学语言学系/乔姆斯基研究所/生物语言学与脑科学实验室,北京 100083

关键词：

大语言模型孤岛效应实验句法学惊讶值

出版年：

2024

DOI：

10.14091/j.cnki.kmxyxb.2024.05.003

昆明学院学报

昆明学院

昆明学院学报

CHSSCD

影响因子：0.167

ISSN：1674-5639

年,卷(期)：2024.46(5)