Evaluating the Linguistic Competence of Large Language Models:An Experimental Study of Island Effects in Chinese
Probability-based next token prediction language models have gained traction due to their impressive performance in generating synthetic languages,leading academics to believe that they embody the essence of the human language faculty.However,linguists adopting the grammar-based model of the faculty of language argue that Large Language Models(LLMs)offer limited insights into the nature of human languages.In this study,we seek to examine the linguistic understanding competence of GPT-2 and Gemma 2 by evaluating its performance on island effects in Chinese.We conducted two experiments.Experiment 1,deploying the factorial design of island effects with the tools of quantification,tested island effects by Chinese native speakers.The results showed that native speakers are able to clearly perceive island effects in Chinese,and accurately identify sentence structures in Chinese that are not grammatically correct.Experiment 2 examined the response of these two models to island effects by calculating the probabilities of sentences that violate and adhere to island constraints in minimal pairs.The results indicated that GPT-2 and Gemma 2 can differentiate the probability distributions of these two types of sentences,but they fail to understand that sentences with lower probability distributions,violating island constraints,are in fact sentences that native speakers would never produce.Comparing the results of the two experiments,we conclude that while LLMs excel in language generation,they still exhibit limitations in understanding and judging specific linguistic phenomena,and have not yet reached human-level linguistic competence.
Large Language Modelsisland effectsexperimental syntaxsurprisal