ChatGPT在中国临床执业医师资格模拟考试中的表现研究

A Study on the Performance of ChatGPT in the Simulated Examination for Clinical Practitioner Qualification in China

扫码查看

原文链接

维普
万方数据

中文摘要：目的评估聊天生成预训练转换器(chat generative pre-trained transformer,ChatGPT)在中国临床执业医师资格模拟考试中的表现,并探讨其优势和局限性,以期对医学教育和知识评估提供参考.方法研究于 2023年 7 月 1—至 9 月 1 日进行,使用一组涵盖多个题型和专业的中国临床执业医师资格考试模拟选择题来评估ChatGPT的答题表现.所有试题都来自医学生常用的备考题库,旨在匹配中国执业医师资格考试的风格、内容和难度.根据试题类型和专业对 300 个选择题进行分组,并进一步将其细分为高阶思维试题和低阶思维试题.ChatGPT的表现通过回答准确率进行评估.结果在所有试题中,ChatGPT回答准确率为 70.3%.ChatGPT对低阶思维试题的回答准确率(78.3%)高于高阶思维试题(66.0%),差异有统计学意义(P＜0.05).ChatGPT对临床医学和非临床医学试题的回答准确率分别为 71.0%和 68.7%,差异无统计学意义(P＞0.05).在 4 个题型中,ChatGPT的回答准确率分别为 69.1%、64.3%、73.9%、70.8%,差异无统计学意义(P＞0.05).即使不正确,ChatGPT也能始终如一地使用自信的语言(100%).结论 ChatGPT能够顺利实现通过中国临床执业医师资格模拟考试的目标,预示其在医学教育和医疗实践中的具有巨大潜力.但是也必须意识到ChatGPT的局限性,例如它在不准确回答时仍然自信地表达.

外文摘要：Objective To evaluate the performance of the chat generative pre-trained transformer(ChatGPT)in Chinese practicing physician licensing simulated examinations and explore its advantages and limitations to provide inspiration for medical education and knowledge assessment.Methods The study was conducted from July 1 to September 1,2023,and the ChatGPT answer performance was evaluated using a set of simulated choice questions of Chinese practicing physician licensing examinations covering multiple item types and specialties.All questions were drawn from a commonly used test-prep item bank for medical students,and the questions were designed to match the style,content,and difficulty of the chinese medical licensing examination.300 choice questions were grouped according to question types and specialty,and further subdivided them into higher-order and lower-order thinking questions.ChatGPT performance was assessed by answer accuracy.Results Among all questions,the answer accuracy of ChatGPT was 70.3%.The answer accuracy of ChatGPT on lower-order thinking problems(78.3%)was higher than that on higher-order thinking problems(66.0%),and the difference was statistically significant(P<0.05).The answer accuracy of ChatGPT was 71.0%and 68.7%on clinical medicine problems and nonclinical medicine problems respectively,and the difference was not statistically significant(P>0.05).Among the four question types,the accuracy of ChatGPT was 69.1%,64.3%,73.9%and 70.8%respectively,and the difference was not statistically significant(P>0.05).ChatGPT consistently uses confident language(100%),even when incorrect.Conclusion ChatGPT can successfully achieve the goal of passing the Chinese practicing physician licensing simulated examination,which indicates the great potential of ChatGPT in medical education and medical practice.However,it is also necessary to be aware of the limitations of ChatGPT,such as its confident expression in the face of inaccurate answers.

外文关键词：

artificial intelligencenatural language processChat GPTChinese practicing physician licensing examinationcontinuing educationmedicine

作者：

张丽、张雪、周海燕、温馨、姜九明、李健维、李谭谭、李蒙

展开 >

作者单位：

国家癌症中心/国家肿瘤临床医学研究中心/中国医学科学院北京协和医学院肿瘤医院影像诊断科,北京 100021

国家癌症中心/国家肿瘤临床医学研究中心/中国医学科学院北京协和医学院肿瘤医院教育处,北京 100021

国家癌症中心/国家肿瘤临床医学研究中心/中国医学科学院北京协和医学院肿瘤医院放疗科,北京 100021

关键词：

人工智能自然语言处理聊天生成预训练转化器中国临床执业医师资格考试继续教育医学

出版年：

2024

DOI：

10.3969/j.issn.1674-9308.2024.15.032

中国继续医学教育

影响因子：2.564

ISSN：1674-9308

年,卷(期)：2024.16(15)