基于提示学习的生成式医疗对话理解方法

扫码查看

原文链接

万方数据
维普

中文摘要：任务型对话系统中的对话理解模块的目标是将用户输入的自然语言转换成结构化的形式,但在面向诊断的医疗对话系统中,现有方法存在如下问题:1)无法支持精准医疗所需的信息粒度,如给出某一症状的严重程度;2)难以同时满足医疗领域中多样化的槽值表示形式,如"症状"等可能含有非连续与嵌套实体的抽取型槽以及"否定"等分类型槽.文中提出了一种基于提示学习的多层次生成式医疗对话理解方法.针对问题1),用多层次槽结构替代当前对话理解任务中单层的槽结构,以表示更细粒度的信息,之后采用一种基于对话风格提示的生成式方法,利用提示字符模拟医患对话,从多轮交互中获得多层次信息.针对问题2),提出在推理过程中使用一种受限的解码策略,使模型能够以统一的方式处理意图识别与分类型和抽取型的槽填充任务,避免复杂的建模.此外,针对医疗领域缺少标注数据的问题,提出了一种两阶段训练策略,以充分利用大规模的无标注医疗对话语料来提升性能.针对含有多层次槽结构的医疗对话理解任务标注并发布了一个数据集,包含4722条对话,涉及17种意图与74种槽.实验结果表明,所提方法能够有效解析医疗对话中的各种复杂实体,相比已有的生成方法,其性能高出2.18％,而在小样本的场景下两阶段训练最高能提高模型5.23％的性能.

外文标题：Prompt Learning-based Generative Approach Towards Medical Dialogue Understanding

外文摘要：The goal of the dialogue understanding module in task-oriented dialogue systems is to convert the user's natural lan-guage input into a structured form.However,in the diagnosis-oriented medical dialogue system,the existing approaches face the following problems:1)the granularity of the information cannot fully satisfy the needs of diagnosis,such as providing the severity of a symptom;2)it is difficult to simultaneously satisfy the diverse representations of slot values in the medical domain,such as"symptom",which may contain non-contiguous and nested entities,and"negation",which may contain categorical value.This pa-per proposes a generative medical dialogue understanding method based on prompt learning.To address problem 1),this paper re-places the single-level slot structure in the current dialogue understanding task with a multi-level slot structure to represent finer-grained information,and then proposes a generative approach based on dialogue-style prompts,which uses prompt tokens to simu-late the dialogue between doctor and patient and obtain multi-level information from multiple rounds of interaction.To address problem 2),this paper proposes the use of a restricted decoding strategy in the inference process,so that the model can handle the intention detection and slot-filling tasks of extractive and categorical slots in a unified manner to avoid complex modeling.In addi-tion,to address the problem of lacking labeled data in the medical domain,this paper proposes a two-stage training strategy to le-verage the large-scale unlabeled medical dialogue corpus to improve performance.In this paper,a dataset containing 4 722 dia-logues involving 17 intentions and 74 types of slots is annotated and published for the medical dialogue understanding task contai-ning a multi-level slot structure.Experiment shows that the proposed approach can effectively parse various complex entities in medical dialogues,with 2.18％higher performance compared to existing generation methods.The two-stage training can improve the performance of the model by up to 5.23％in the scenario with little data.

外文关键词：

Prompt learningNatural language understandingMedical dialogue systemGenerative modelTwo-stage training

作者：

柳俊、阮彤、张欢欢

展开 >

作者单位：

华东理工大学信息科学与工程学院上海 200237

关键词：

提示学习自然语言理解医疗对话系统生成式模型两阶段训练

基金：

国家重点研发计划国家重点研发计划

项目编号：

2021YFC27018002021YFC2701801

出版年：

2024

DOI：

10.11896/jsjkx.230300007

计算机科学

重庆西南信息有限公司（原科技部西南信息中心）

计算机科学

CSTPCD北大核心

影响因子：0.944

ISSN：1002-137X

年,卷(期)：2024.51(5)

参考文献量28