首页|基于RBAC模型的中文医疗命名实体识别

基于RBAC模型的中文医疗命名实体识别

扫码查看
中文医疗命名实体识别旨在从非结构化数据中抽取结构化实体,目前的主流研究都使用了大量的训练数据。针对中文医疗命名实体识别训练数据匮乏的问题,提出了基于联合分词的 RBAC(RoBERTa-BiGRU-Attention-CRF)模型和基于语义搜索的命名实体识别数据增强方法。首先利用预训练模型和双向门控循环单元(BiGRU)提取文本的深度双向语义表示,再将该语义表示分别送入分词模块和命名实体识别模块。分词模块利用条件随机场(CRF)得到分词信息。命名实体识别模块利用 BiGRU 与多头注意力得到混合语义表示,再送入 CRF 得到命名实体识别的标签序列。在CCKS2019 中文电子病历数据集上的实验结果表明,该方法在数据量较少的情况下F1达到 90。5%,证明了该方法的有效性。
Chinese Medical Named Entity Recognition Based on RBAC Model
Chinese medical named entity recognition aims to extract structured entities from unstructured data.Current mainstream research uses a large amount of training data.Aiming at the problem of lack of training data for Chinese medical named entity recognition,a RoBERTa-BiGRU-Attention-CRF(RBAC)model based on joint segmentation and a novel data enhancement method for named entity recognition based on semantic search are proposed in this article.Specifically,the pretrained model and the Bidirectional Gated Recurrent Unit(BiGRU)are first used to extract the deep bidirectional semantic representation of the text,and then the semantic representation is sent to the word segmentation module and the named entity recognition module respectively.The word segmentation module uses conditional random fields(CRF)to obtain word seg-mentation information.The named entity recognition module uses BiGRU and multi-head attention to obtain a mixed seman-tic representation,and then is sent to CRF to obtain the tag sequence for named entity recognition.Experimental results on the CCKS2019 Chinese electronic medical record datasets showed that the F1 of this method reached 90.5%when the amount of data was small,thus proving the effectiveness of this method.

multi-task learningpretrained modelBiGRUmulti-head attentionCRFdata enhancement

张斌、赵婷婷、张碧霞、陈亚瑞、王嫄

展开 >

天津科技大学人工智能学院,天津 300457

多任务学习 预训练模型 双向门控循环单元 多头注意力 条件随机场 数据增强

2024

天津科技大学学报
天津科技大学

天津科技大学学报

影响因子:0.269
ISSN:1672-6510
年,卷(期):2024.39(5)