首页|基于少样本命名实体识别技术的电子病历指纹特征提取

基于少样本命名实体识别技术的电子病历指纹特征提取

扫码查看
随着《中华人民共和国个人信息保护法》《中华人民共和国数据安全法》等有关法律法规的颁布实施,电子病历数据保护引起大家的重视.快速高效识别电子病历是数据保护的第一环节,也是数据安全领域的重要研究课题之一.文章提出一种基于少样本命名实体识别技术的电子病历指纹特征提取方法,首先通过公共数据集训练编码器,获得广阔的文本特征空间;然后使用电子病历数据集微调编码器,并利用原型网络表征实体类型标签;最后通过提取电子病历特征,得到"实体类型+实体集"的指纹特征.实验结果表明,与对比模型相比,该方法在I2B2 数据集上性能更优异,有效提升了对电子病历数据的隐私保护能力.
Fingerprint Feature Extraction of Electronic Medical Records Based on Few-Shot Named Entity Recognition Technology
With the promulgation and implementation of the"Personal Information Protection Law of the People's Republic of China""Data Security Law of the People's Republic of China"and other relevant laws and regulations,electronic medical record data protection has attracted much attention.Fast and efficient identification of electronic medical records is the first link of data protection and an important research topic in the field of data security.This paper proposed an electronic medical record fingerprint feature extraction method based on few-shot named entity recognition technology.First,the encoder was trained through a public dataset to obtain a broad text feature space.Subsequently,the encoder was fine-tuned using the electronic medical record dataset,and the entity type label was characterized by a prototype network.Finally,the fingerprint feature of"entity type+entity set"was obtained by extracting the electronic medical record feature.The experimental results show that the method has excellent performance on the I2B2 dataset,surpassing other models and effectively improving the privacy protection ability of electronic medical record dataset.

data securityelectronic medical recordscomparative learningnamed entity recognitionfew-shot learning

王亚欣、张健

展开 >

南开大学网络空间安全学院,天津 300350

天津市网络与数据安全技术重点实验室,天津 300350

数据安全 电子病历 对比学习 命名实体识别 少样本学习

国家重点研发计划天津市重点研发计划

2022YFB310320220YFZCGX00680

2024

信息网络安全
公安部第三研究所 中国计算机学会计算机安全专业委员会

信息网络安全

CSTPCDCHSSCD北大核心
影响因子:0.814
ISSN:1671-1122
年,卷(期):2024.24(10)