国内不同大型语言模型对前列腺癌围术期护理与健康教育相关问题的查询响应与效果评价

Efficiency of different large language models in China in response to consultations about PCa-related perioperative nursing and health education

谭晓文 ¹陈文芳 ¹王娜娜 ¹李惠玉 ¹李娟 ¹曹毓美 ¹朱梦琪 ¹李坤 ¹张廷玲 ¹傅点¹

扫码查看

作者信息

1. 东部战区总医院泌尿外科,江苏南京 210002
折叠

摘要

目的:评估国内4种拥有庞大用户基础和显著社会关注度的大型语言模型(文心一言、智谱清言、讯飞星火、通义千问)在对前列腺癌围术期护理与健康教育相关问题咨询响应的有效性.方法:编制一份包括前列腺癌根治术患者普遍关心的15个问题与2个常见护理个案的问卷,分别将问题输入4种模型进行模拟咨询,由3位护理专家根据预先设计的Likert 5级量表,从准确性、全面性、易理解性、人文关怀和案例分析等方面对模型反应进行评估,并采用可视化工具和统计分析对4种模型进行比较并评价其性能.结果:所有模型生成的文本质量良好,未发现误导性信息,都产生了令人满意的表现.通义千问各方面得分最高,且与智谱清言在多个测试中输出相对稳定;讯飞星火在易理解性方面表现较好,但在全面性和人文关怀上存在不足;智谱清言与通义千问在案例分析中均发挥了出色的表现;文心一言整体表现略逊一筹.综上所述,通义千问在前列腺癌围术期护理与健康教育咨询方面表现最好.结论:在前列腺癌围术期护理中,以通义千问为代表的大型语言模型有望成为一个强大的辅助工具,为患者提供更多的医学专业知识和信息支持,从而显著提高患者的依从性,改善临床诊疗护理质量.

Abstract

Objective:To evaluate the efficiency of the four domestic language models,ERNIE Bot,ChatGLM2,Spark Desk and Qwen-14B-Chat,all with a massive user base and significant social attention,in response to consultations about PCa-related perio-perative nursing and health education.Methods:We designed a questionnaire that includes 15 questions commonly concerned by patients undergoing radical prostatectomy and 2 common nursing cases,and inputted the questions into each of the four language models for simulation consultation.Three nursing experts assessed the model responses based on a pre-designed Likert 5-point scale in terms of accuracy,comprehensiveness,understandability,humanistic care,and case analysis.We evaluated and compared the performance of the four models using visualization tools and statistical analyses.Results:All the models generated high-quality texts with no mis-leading information and exhibited satisfactory performance.Qwen-14B-Chat scored the highest in all aspects and showed relatively sta-ble outputs in multiple tests compared with ChatGLM2.Spark Desk performed well in terms of understandability but lacked comprehen-siveness and humanistic care.Both Qwen-14B-Chat and ChatGLM2 demonstrated excellent performance in case analysis.The overall performance of ERNIE Bot was slightly inferior.All things considered,Qwen-14B-Chat was superior to the other three models in con-sultations about PCa-related perioperative nursing and health education.Conclusion:In PCa-related perioperative nursing,large language models represented by Qwen-14B-Chat are expected to become powerful auxiliary tools to provide patients with more medical expertise and information support,so as to improve the patient compliance and the quality of clinical treatment and nursing.

关键词

前列腺癌/大型语言模型/人工智能/健康教育

Key words

prostate cancer/large language model/artificial intelligence/health education

引用本文复制引用

出版年

2024

中华男科学杂志

南京军区南京总医院

中华男科学杂志

CSTPCDCSCD

影响因子：1.052

ISSN：1009-3591

参考文献量14

段落导航