中文电子病历数据元抽取方法

A Method for Extracting Data Elements from Chinese Electronic Medical Records

郭维嘉 ¹郭少友²

扫码查看

作者信息

1. 河南省图书馆郑州 450052
2. 郑州大学信息管理学院郑州 450001
折叠

摘要

目的/意义提出基于国家标准的电子病历数据元抽取方法,以实现电子病历数据的细粒度共享.方法/过程利用ALBERT、BiLSTM和CRF模型对电子病历进行序列标注,并根据标注结果生成一组候选数据元;针对每个候选数据元,采集其上下文信息并形成一个增强的键向量;计算该向量与标准向量之间的相似度,据此判断候选数据元是否有效.结果/结论该方法F1值为90.32％,效果较好.

Abstract

Purpose/Significance A method is proposed for extracting data elements from electronic medical records(EMR)based on national standards,helping to achieve fine-grained sharing of EMR data.Method/Process The ALBERT,BILSTM and CRF models are used to perform sequence labeling on EMR,and a set of candidate data elements based on labeling results are generated.For any can-didate data elements,the contextual information is collected to form an enhanced key vector.Then the similarity between the vector and the standard vector is calculated to determine whether the candidate data element is valid.Result/Conclusion The F1 value is 90.32％,indicating the proposed method has a good performance.

关键词

电子病历/数据元/ALBERT/序列标注/token向量

Key words

electronic medical records(EMR)/data element/ALBERT/sequence labeling/token

引用本文复制引用

出版年

2024

医学信息学杂志

中国医学科学院

医学信息学杂志

CSTPCD

影响因子：1.348

ISSN：1673-6036

段落导航