Research on intelligent scoring of subjective questions in Pharmacology exams based on Large Language Models
This article explores the application effect of Large Language Model(LLM)in in-telligent scoring of subjective questions in Pharmacology.Five LLMs,namely ChatGPT 4.0,Claude 2,iFLYTEK Spark Large Cognitive Model 3.0,ChatGLM 3.0,and ERNIE Bot 3.5,were selected to score the subjective questions of short text of Pharmacology through a variety of scoring standards and prompt engineering techniques.The results showed that in terms of scoring,ChatGPT 4.0 performed the best,with mean absolute error rate(MAER)and root mean square error(RMSE)of 0.051 7 and 1.033 9,respectively,and intraclass correlation coefficient(ICC)of 0.936,indicating a high level of consistency and accuracy in its scoring.Claude 2 followed closely,with MAER and RMSE of 0.072 4 and 1.299 9,respectively,and ICC of 0.893,demonstrating good scoring performance.Other models perform poorly in terms of score consistency and bias,especially iFLYTEK Spark Large Cognitive Model 3.0,with MAER and RMSE of 0.282 8 and 3.028 6,respectively,and ICC of only 0.217.Overall,LLM can effectively utilize its language comprehension and logical reasoning abilities,achieve intelligent scoring of subjective questions,and provide detailed scoring analysis,which helps to improve student's learning efficiency and self-evaluation ability.Compared with traditional manual scoring,LLM has higher efficiency and cost-effectiveness in intelligent scoring of subjective ques-tions.This study provides a new perspective and method for the application of advanced models such as ChatGPT in the field of education,and also provides reference for the development and application of artificial intelligence in future education.
artificial intelligenceLarge Language Modelsintelligent scoring of subjective questionsPharmacologyprompt engineering