首页|ChatGPT掌握现代汉语书面语的句长规律了吗?

ChatGPT掌握现代汉语书面语的句长规律了吗?

扫码查看
本文用兰卡斯特汉语语料库中15种书面语体的文本研究了汉语整句、小句单位长度的关系,并比较了其中3种语体文本与ChatGPT生成文本的句长分布.结果表明:现代汉语书面语的整句和小句符合语言中相邻层级单位的长度关系规律;ChatGPT生成的汉语文本基本符合自然语言句长的概率分布与单位层级规律,但在遵循省力原则、语体区分能力等方面同真实文本尚有差距.大语言模型已经获取了一些自然语言中的统计规律,但还没有完全掌握真实文本中一些细节特征.
Has ChatGPT Mastered the Sentence Length Patterns of Written Modern Chinese?
This paper studies the relationship of full sentence lengths and minor sentence lengths using 15 written genres of texts in the Lancaster Corpus of Modern Chinese and compares the sentence length distribution of three of them with that of the text generated by ChatGPT.The full and minor sentences in written modern Chinese conform to the length relation patterns of adjacent hierarchical units in human languages.The Chinese text generated by ChatGPT basically conforms to the frequency distribution and hierarchical unit patterns of natural languages,but there are still gaps in Least Effort Principle and genre differentiation.The findings show that the large language models can grasp the statistical law of natural language,but they may not have mastered the subjective characteristics of authentic texts.

modern Chinesesentence length distributionlanguage universalsstylistic differenceChatGPT

周义凯、刘海涛

展开 >

浙江大学外国语学院 浙江 杭州 310058

现代汉语 句长分布 语言普遍性 语体差异 ChatGPT

教育部人文社会科学重点研究基地重大项目

22JJD740018

2024

语言文字应用
教育部语言文字应用研究所

语言文字应用

CSTPCDCSSCICHSSCD北大核心
影响因子:1.215
ISSN:1003-5397
年,卷(期):2024.(2)