医学新知2024,Vol.34Issue(3) :312-321.DOI:10.12173/j.issn.1004-5511.202308006

面向真实世界的知识挖掘与知识图谱补全研究(三):基于正则表达式对膀胱癌真实世界数据的结构化信息抽取

Research on real-world knowledge mining and knowledge graph completion(Ⅲ):structured information extraction from real world data of bladder cancer based on regular expression

马文昊 石涵予 黄桥 黄兴 王永博 王诗淳 任相颖 施悦 靳英辉 阎思宇
医学新知2024,Vol.34Issue(3) :312-321.DOI:10.12173/j.issn.1004-5511.202308006

面向真实世界的知识挖掘与知识图谱补全研究(三):基于正则表达式对膀胱癌真实世界数据的结构化信息抽取

Research on real-world knowledge mining and knowledge graph completion(Ⅲ):structured information extraction from real world data of bladder cancer based on regular expression

马文昊 1石涵予 2黄桥 3黄兴 4王永博 3王诗淳 3任相颖 3施悦 5靳英辉 3阎思宇3
扫码查看

作者信息

  • 1. 武汉大学中南医院循证与转化医学中心(武汉 430071);武汉大学第二临床学院(武汉 430071)
  • 2. 武汉大学弘毅学堂(武汉 430072)
  • 3. 武汉大学中南医院循证与转化医学中心(武汉 430071)
  • 4. 浙江大学医学院附属第一医院泌尿外科(杭州 310003)
  • 5. 武汉大学中南医院信息中心(武汉 430071)
  • 折叠

摘要

随着医疗大数据的发展,真实世界研究近些年来越来越受到重视,发展前景良好,但真实世界研究的实施仍存在一些挑战,引起学者们广泛讨论.真实世界数据的非结构化是目前最亟待解决的问题.本研究以正则表达式为基础,通过基于规则的信息抽取方法对武汉大学中南医院近几年膀胱癌患者的入院记录、病理报告、手术记录和影像记录等数据进行结构化信息抽取,并以准确率和召回率为指标评价其抽取效果,旨在为后续研究提供参考.

Abstract

With the development of medical big data,the real-world study(RWS)has received increasing attention in recent years,and has a good promising prospect.However,there are still some challenges in the implementation of RWS that has led to extensive discussion among scholars.The most urgent issue currently to be addressed is the unstructured nature of real-world data(RWD).Based on regular expressions,this study used rule-based information extraction method to extract structured information from admission records,pathological reports,surgical records,and image records of bladder cancer patients in Zhongnan Hospital of Wuhan University in recent years,and evaluated the extraction effects with accuracy and recall as indicators,aiming to provide reference for subsequent research.

关键词

真实世界数据/信息抽取/正则表达式/自然语言处理/电子病历数据/膀胱癌

Key words

Real-world data/Information extraction/Regular expression/Natural language processing/Electronic medical record data/Bladder cancer

引用本文复制引用

基金项目

国家自然科学基金面上项目(82174230)

武汉大学中南医院青年交叉学科专项(ZNQNTC2023006)

出版年

2024
医学新知
武汉大学中南医院,中国农工民主党湖北省委医药卫生工作委员会

医学新知

CSTPCD
影响因子:0.243
ISSN:1004-5511
参考文献量19
段落导航相关论文