基于编码器-解码器架构的藏医药文本实体关系联合抽取

扫码查看

原文链接

万方数据
维普

中文摘要：在藏医药领域,准确提取医学文本中的医学实体及其关系并结构化为三元组,对于构建藏医药知识图谱具有重要意义.然而,现有方法主要依赖通用预训练模型处理藏医药文本,这些模型未能充分覆盖藏医药领域的专业术语,且在泛化性和鲁棒性方面存在不足.为此,文章提出了一种新型模型,该模型基于编码器-解码器架构,并融合了指针机制.在编码阶段,BERT和GloVe被用于生成丰富的嵌入表示,这些表示经过融合,增强了模型对医学领域文本的理解力;在解码阶段,通过将Transformer解码器和指针机制结合,模型直接生成与实体和关系相关的结构化信息.此外,文章通过引入"相似跨度"的概念和相应的惩罚性训练策略,进一步增强了模型识别实体的能力.通过在CMeIE-V2和藏医药数据集Tibeta-nAI_TMDisRE_v1.0上进行广泛实验,并与基线模型进行对比,验证了文章模型的卓越性能和鲁棒性.

外文标题：Joint Entity and Relation Extraction for Tibetan Medicine Texts Based on Encoder-Decoder Architectures

外文摘要：In the study field of Tibetan medicine,it is essential to accurately extract the medical entities and their relationships in medicine texts and structure them into triples,which is crucial for constructing knowledge graphs.However,the existing methods,which mainly rely on general pre-trained models to process Tibetan med-icine texts,often overlook the specialized terminology,leading to limitations in generalization and robustness.This paper propose a model based on the encoder-decoder architecture,enhanced with a pointer mechanism,to overcome these shortcomings.In the encoding phase,the model utilizes BERT and GloVe to generate rich embed-dings,significantly improving the understanding of medical terms.In the decoding phase,a Transformer decoder is combined with a pointer mechanism to produce structured entity and relationship information directly.The training process incorporates the concept of similar spans to refine the model's entity recognition capabilities.Ex-periments on the CMeIE-V2 and TibetanAI_TMDisRE_v1.0 datasets show that this model outperforms advanced baselines in performance and robustness.

外文关键词：

encoder-decoder architecturepointer mechanismTibetan medicine textsjoint entity and relation extraction

作者：

高兴、拥措

展开 >

作者单位：

西藏大学信息科学技术学院西藏拉萨 850000

西藏自治区藏文信息技术人工智能重点实验室西藏拉萨 850000

藏文信息技术教育部工程研究中心西藏拉萨 850000

关键词：

编码器-解码器架构指针机制藏医药文本实体关系联合抽取

出版年：

2024

DOI：

10.16249/j.cnki.2096-4617.2024.04.013

高原科学研究

ISSN：

年,卷(期)：2024.8(4)