首页|基于字形特征的血管外科命名实体识别

基于字形特征的血管外科命名实体识别

扫码查看
电子病历(EMR)作为医疗信息化建设的核心,蕴含着众多有价值的医疗实体,对电子病历进行命名实体识别有助于推进医学研究。为解决血管外科电子病历研究数据匮乏、实体复杂识别困难等问题,基于某三甲医院血管外科的真实临床数据,构建一个小规模的专科数据集作为实验数据集,并提出一种基于字形特征的命名实体识别模型。首先,采用掩码校正的来自Transformer的双向编码器表示(MacBERT)生成动态字向量,引入汉字四角码与汉字五笔两个维度的字形信息;然后,将文本表示传入双向门控循环单元(BiGRU)与门控空洞卷积神经网络(DGCNN)进行特征提取,并对输出结果进行拼接;最后,通过多头自注意力机制捕捉序列内部元素间的关系,利用条件随机场(CRF)进行标签解码。实验结果表明,所提模型在自建血管外科数据集上的精确率、召回率、F1值分别为96。45%、97。77%、97。10%,均优于对比模型,具有更好的实体识别性能。
Named Entity Recognition of Vascular Surgery Based on Glyph Features
As core components of healthcare information systems,Electronic Medical Record(EMR)entails numerous important medical entities.Named Entity Recognition(NER)of EMRs can significantly advance medical research.To address the challenges of limited research data and complex entity recognition in vascular surgery EMRs,a small-scale specialized dataset is constructed using real clinical data obtained from the vascular surgery department of a tertiary hospital.A NER model based on glyph features is proposed to improve the recognition accuracy.First,dynamic character vectors are generated using the Masked Language Model(MLM)as correction Bidirectional Encoder Representations from Transformers(MacBERT)and incorporating glyph information via the Chinese four-corner code and Wubi input methods.The text representations are then fed into a Bi-directional Gated Recurrent Unit(BiGRU)and Gated Dilated Convolutional Neural Network(DGCNN)for feature extraction,and the outputs are subsequently concatenated.Finally,the model employs a multihead self-attention mechanism to capture the relationships between sequence elements and uses Conditional Random Field(CRF)for label decoding.Experimental results demonstrate that the proposed model achieves precision,recall,and F1 scores of 96.45%,97.77%,and 97.10%,respectively,on the self-constructed vascular surgery dataset.These results indicate that the proposed model outperforms the comparison models and demonstrates superior entity recognition performance.

Electronic Medical Record(EMR)vascular surgeryNamed Entity Recognition(NER)feature fusiondeep learning

张华青、夏张涛、陆晓庆、童基均

展开 >

浙江大学医学院附属第二医院临床医学工程部,浙江杭州 310009

浙江理工大学计算机科学与技术学院,浙江杭州 310018

电子病历 血管外科 命名实体识别 特征融合 深度学习

浙江省自然科学基金浙江省基础公益研究计划项目

LQ22F010006LTGY23H170004

2024

计算机工程
华东计算技术研究所 上海市计算机学会

计算机工程

CSTPCD北大核心
影响因子:0.581
ISSN:1000-3428
年,卷(期):2024.50(8)
  • 4