首页|大语言模型在检验医学领域的应用潜力与挑战评估

大语言模型在检验医学领域的应用潜力与挑战评估

扫码查看
目的 评估ChatGPT-4.0、ERNIE Bot-4.0在检验医学领域的应用表现,探讨其在专业领域内的应用潜力及面临的挑战.方法 以全国临床医学检验技术(中级)考试真题作为基准,对比2个模型在检验医学知识掌握和答题一致性方面的表现;通过30个检验医学病例评估模型在检验结果解读和辅助诊断方面的能力.结果 在临床医学检验技术测试中,2个模型均通过了 60%的合格线.ChatGPT-4.0在答题速度和一致性方面优于ERNIE Bot-4.0,但在答题正确率上明显低于ERNIE Bot-4.0(73.25%vs 80.75%),且ERNIE Bot-4.0正确率高于临床检验人员此项考试的平均正确率78.03%.不同题型正确率分析方面,ERNIE Bot-4.0和ChatGPT-4.0均在实验技术题型中表现最差(66.32%和60.53%),在医学基础知识题型上表现最好,成绩都为86.00%.在病例分析测试中,ERNIE Bot-4.0的各项评分均高于ChatGPT-4.0,两者均在常规病例分析上表现良好,但在复杂病例分析中会发生错误.结论 在检验医学领域,2个大语言模型都展现出了一定的应用潜力,特别是在中文环境下,ERNIE Bot-4.0在答题正确率和病例分析能力方面显著优于ChatGPT-4.0,这显示了其在国内应用中的相对优势.不过,2个模型在实验技术知识、复杂病例的分析能力以及结果输出的准确性和一致性方面还有待提升.在现阶段,直接将这类通用型大语言模型应用于临床检验结果解读及辅助诊断仍存在一定风险,这为检验报告的解读提供了新的研究方向.
Evaluation of the Application Potential and Challenges of Large Language Models in the Field of Laboratory Medicine
Objective To evaluate the performance of ChatGPT-4.0 and ERNIE Bot-4.0 in the field of laboratory medicine,and ex-plore their application potential and challenges in this professional domain.Methods Using the national clinical medical laboratory technology(intermediate)examination questions as a benchmark,we compared the performance of the two models in mastering labora-tory medicine knowledge and answering consistency.We also and assessed the models'ability in interpreting test results and assisting diagnosis through 30 laboratory medicine cases.Results In the clinical medical examination technology test,both models passed the 60%qualification threshold.ChatGPT-4.0 was superior to ERNIE Bot-4.0 in terms of answering speed and consistency,but its answer-ing accuracy was significantly lower than that of ERNIE Bot-4.0(73.25%vs 80.75%).ERNIE Bot-4.0's accuracy rate was higher than the average accuracy rate of clinical aboratory personnel in this examination(78.03%).In the accuracy analysis of different question types,both performed worst in experimental technology questions(ERNIE Bot-4.0:66.32%,ChatGPT-4.0:60.53%)and best in bas-ic medical knowledge questions(both scoring 86.00%).In the case analysis test,ERNIE Bot-4.0 outperformed ChatGPT-4.0 in all cat-egories.Both models performed well in routine case analysis but made errors in complex case analysis.Conclusion In the field of la-boratory medicine,both large language models have shown certain application potential,especially in a Chinese context,where ERNIE Bot-4.0 significantly outperforms ChatGPT-4.0 in terms of answering accuracy and case analysis ability,indicating its relative advantage in domestic applications.However,both models still need improvement in experimental technical knowledge,complex case analysis ca-pabilities,and the accuracy and consistency of result output.At the current stage,there are still certain risks in directly applying such general large language models to clinical test result interpretation and assisted diagnosis,which provides a new research direction for the interpretation of test reports.

large language modelmedical laboratoryartificial intelligenceresult interpretationcase analysis

陆小琴、佳薇、武宇翔、武永康

展开 >

四川大学华西医院实验医学科,成都 610041

金堂县第一人民医院,成都 610400

雅安职业技术学院药学与检验学院,四川雅安 625000

海南医科大学,海口 571199

展开 >

大语言模型 医学检验 人工智能 结果解读 病例分析

2023年度四川省留学回国人员科技活动项目

川人社-202303-5

2024

临床检验杂志
江苏省医学会

临床检验杂志

CSTPCD
影响因子:0.746
ISSN:1001-764X
年,卷(期):2024.42(8)