基于BERT-CRF的中药材属性抽取方法研究

Research on Attribute Extraction Method of Traditional Chinese Medicine Based on BERT-CRF

乔波 ¹袁铨 ¹周子濠¹

扫码查看

作者信息

1. 湖南农业大学信息与智能科学技术学院,长沙 410128
折叠

摘要

在自然语言处理领域,属性抽取技术面临着精度不高、大规模训练数据获取困难等问题.针对这些问题提出一种基于BERT-CRF的中药材属性抽取方法,将属性抽取任务转化为序列标注任务,结合预训练语言模型BERT的丰富语义信息与条件随机场CRF对上下文特征的理解能力,有效提升属性抽取的精度,通过书籍与网络数据构建了一个中药材属性抽取数据集,将BERT-CRF属性抽取方法用于公开数据集MSRA和中药材属性抽取数据集上.结果显示,该模型在精确率、召回率和F1分数等方面相比于其他序列标注模型均表现出了显著的优势,证实了其在中药材属性抽取任务中的有效性.

Abstract

In the current domain of natural language processing,attribute extraction techniques are confronted with issues,such as low accuracy and the challenge of obtaining large-scale training data.Addressing this issue,the study proposes a method for attribute extraction from Chinese herbal medicines based on the BERT-CRF framework.This approach transforms the attribute extraction task into a sequential labeling task,leveraging the rich semantic information provided by the pre-trained language model BERT and the context feature understanding capability of CRF to enhance the precision of attribute extraction.This research also constructs a dataset for attribute extraction from Chinese herbal medicines with book and web data,and applies the BERT-CRF attribute extraction method to publicly available datasets like MSRA and the dataset for Chinese herbal medicine attributes.The results demonstrate that the proposed model outperforms other sequential labeling models in precision,recall,and F1 score,thereby validates its effectiveness in the task of attribute extraction for Chinese herbal medicines.

关键词

自然语言处理/属性抽取/预训练语言模型/条件随机场

Key words

Natural language processing/Attribute extraction/Pre-trained language model/Conditional random field

引用本文复制引用

出版年

2024

黑龙江科学

黑龙江省科学院

黑龙江科学

影响因子：1.014

ISSN：1674-8646

段落导航