Has ChatGPT Mastered the Sentence Length Patterns of Written Modern Chinese?
This paper studies the relationship of full sentence lengths and minor sentence lengths using 15 written genres of texts in the Lancaster Corpus of Modern Chinese and compares the sentence length distribution of three of them with that of the text generated by ChatGPT.The full and minor sentences in written modern Chinese conform to the length relation patterns of adjacent hierarchical units in human languages.The Chinese text generated by ChatGPT basically conforms to the frequency distribution and hierarchical unit patterns of natural languages,but there are still gaps in Least Effort Principle and genre differentiation.The findings show that the large language models can grasp the statistical law of natural language,but they may not have mastered the subjective characteristics of authentic texts.
modern Chinesesentence length distributionlanguage universalsstylistic differenceChatGPT