首页|基于预训练的藏医药实体关系抽取

基于预训练的藏医药实体关系抽取

扫码查看
藏医药领域的文本主要以非结构化形式保存,藏医药文本的信息抽取对挖掘藏医药的知识有重要作用.针对现有藏文实体关系抽取模型语义表达能力差、嵌套实体抽取准确率低的问题,该文介绍了一种基于预训练模型的实体关系抽取方法,使用TibetanAI_ALBERT_v2.0预训练语言模型,使得模型更好地识别实体,使用Span方法解决实体嵌套问题.在Dropout的基础上,增加了一个KL散度损失函数项,提升了模型的泛化能力.在Tibet-anAI_TMIE_v1.0藏医药数据集上进行了实验,实验结果表明,精确率、召回率和F1值分别达到了 84.5%、80.1%和82.2%,F1值较基线提升了 4.4个百分点,实验结果证明了该文方法的有效性.
Entity Relation Extraction Based on Pre-trained Language Model for Tibetan Medicine
The texts in the field of Tibetan medicine are mainly stored in unstructured form.The information extrac-tion of Tibetan medicine texts plays an important role in excavating the knowledge of famous Tibetan medicine.In response to the problems of poor semantic expression ability and low accuracy of nested entity extraction in existing Tibetan entity relation extraction models,this paper introduces a pre-trained entity relation extraction method.The TibetanAI_ALBERT_v2.0 pre-trained language model is used to enable the model to better recognize entities,and the Span method is used to solve the problem of entity nesting.On the basis of Dropout,a KL divergence loss func-tion is added to enhance the model's generalization ability.Experiments on the TibetanAI_TMIE_v1.0 dataset of Ti-betan medicine show that the precision,recall,and F1 score have reached 84.5%,80.1%,and 82.2%,respectively.The F1 score has increased by 4.4 percentage points compared to the baseline.The results demonstrate the effective-ness of the proposed method.

Tibetan medicineentity relation extractionpre-trained language model

周青、拥措、拉毛东只、尼玛扎西

展开 >

西藏大学信息科学技术学院,西藏拉萨 850000

西藏自治区藏文信息技术人工智能重点实验室,西藏拉萨 850000

藏文信息技术教育部工程研究中心,西藏拉萨 850000

藏医药 实体关系抽取 预训练语言模型

西藏自治区科技厅项目科技创新2030——"新一代人工智能"重大项目

XZ202401JD00102022ZD0116100

2024

中文信息学报
中国中文信息学会,中国科学院软件研究所

中文信息学报

CSTPCDCHSSCD北大核心
影响因子:0.8
ISSN:1003-0077
年,卷(期):2024.38(8)