首页|基于提示学习的医学量表问题文本多分类研究

基于提示学习的医学量表问题文本多分类研究

扫码查看
目的 目前医学量表资源的加工与组织多集中在文档层面,不利于用户从条目层面进行检索与复用.本文旨在提出一种低资源场景下的医学量表条目多分类方法,支持细粒度医学量表资源组织与服务.方法 采用一种基于预训练语言模型BERT的提示学习分类方法来实现医学量表条目文本的多分类.首先收集肺癌临床评估量表,提取功能、领域分类标签,采用人工标注"功能-领域"组合标签形成肺癌临床评估条目小样本语料集;然后采用提示学习方法,通过将自定义构建的模板格式输入BERT模型,对模板空缺位置进行预测填充;最后将填充文本映射到标签,实现对医学量表中条目文本的多分类.结果 构建的语料包含肺癌临床评估条目347条,涉及"功能-领域"分类标签9个;在自制的语料集上,提出的多分类方法的平均准确率达到93%,比次优的GAN-BERT模型性能提高约6%.结论 基于预训练语言模型BERT的提示学习分类方法能够在减少医学量表条目语料构建成本的同时保持较优的性能,在医学量表条目分类研究与实践中具有推广价值.
A study on multi-class classification of medical questionnaire item texts based on prompt learning
Objective The current medical questionnaire resources are mainly processed and organized at the document level,which hampers user access and reuse at the questionnaire item level.This study aims to propose a multi-class classification of items in medical questionnaires in low-resource scenarios,and to support fine-grained organization and provision of medical questionnaires resources.Methods We introduced a novel,BERT-based,prompt learning approach for multi-class classification of items in medical questionnaires.First,we curated a small corpus of lung cancer medical assessment items by collecting relevant clinical assessment questionnaires,extracting function and domain classifications,and manually annotating the items with"function-domain"combination labels.We then employed prompt learning by feeding the customized template into BERT.The masked positions were predicted and filled,followed by mapping the populated text to labels.This process enables the multi-class classification of item texts in medical questionnaires.Results The constructed corpus comprised 347 clinical assessment items for lung cancer,across nine"function-domain"labels.The experimental results indicated that the proposed method achieved an average accuracy of 93%on our self-constructed dataset,outperforming the runner-up GAN-BERT by approximately 6%.Conclusion The proposed method can maintain robust performance while minimizing the cost of building medical questionnaire item corpora,illustrating its promotion value of research and practice in medical questionnaire classification.

Medical questionnaireQuestion classificationMulti-class classificationPrompt learningPre-trained language model

郝洁、彭庆龙、丛山、李姣、孙海霞

展开 >

中国医学科学院/北京协和医学院医学信息研究所(北京 100020)

哈尔滨工程大学青岛创新发展基地(山东青岛 266000)

医学量表 问题分类 多分类 提示学习 预训练语言模型

国家社会科学基金项目中国医学科学院医学与健康科技创新工程项目国家重点研发计划

21BTQ0692021-I2M-1-0562022YFC3601005

2024

中国循证医学杂志
四川大学

中国循证医学杂志

CSTPCD北大核心
影响因子:1.761
ISSN:1672-2531
年,卷(期):2024.24(1)
  • 4