Practical Exploration of Large Language Models in Chinese Automated Essay Evaluation
This study aims to evaluate the performance of large language models in two typical writing intelligent assessment tasks:automatic writing scoring and intelligent commentary generation.Focusing on Chinese as a second language learners,this research employed three different prompting strategies to verify the effectiveness of large language models in automatic writing scoring and automated feedback generation,including standard prompts,thought chain prompts,and self-consistent thought chain prompts.The results show that although large language models demonstrate potential in the automatic writing scoring task,their stability and reliability still need improvement.However,by continuously optimizing these prompting strategies,the capability of the model to handle writing scoring and commentary generation can be significantly enhanced.Moreover,different prompts yield different effects,and assessing the performance of large language models with prompts involves subjectivity.Thus,they cannot fully replace teachers'independent assessment tests but can serve as auxiliary tools to improve the efficiency of teachers'assessment of compositions at this stage.The findings of this study provide strong support for the application of large language models in the field of intelligent assessment of Chinese writing,emphasizing their potential value in enhancing the performance of assessment systems.This serves as a reference for developing more efficient and accurate intelligent assessment systems for Chinese writing in the future.
automated essay evaluationautomated essay scoringintelligent commentary generationlarge language modelChatGLM