医疗信息文本中的个人隐私数据识别与计量研究
Research on Privacy Data Identification and Measurement Based on Medical Information Text
张凯亮 1臧国全 1肖洋2
作者信息
- 1. 郑州大学信息管理学院,郑州 450001;郑州市数据科学研究中心,郑州 450001
- 2. 郑州大学信息管理学院,郑州 450001
- 折叠
摘要
基于现行医疗数据行业标准中的数据分级结果模糊,且缺乏对分级要素定量测度的现状.本研究通过挖掘医疗信息文本,从客观认知视角进行医疗数据隐私计量,为验证与改进现行医疗数据分级结果提供参考.医疗数据行业标准、法律法规、学术论文和泄露案例构成医疗敏感数据识别来源,敏感名词(数据项)、敏感动词和敏感程度词等敏感词汇组成的敏感数据单元构成隐私数据识别模型,敏感词汇的敏感性、语义强度和文本力度等指标构成隐私计量模型.研究结果表明,医疗应用数据(医疗检验数据、治疗过程数据、医疗记录数据)和健康状况数据(主诉与既往病史、现病史与生活方式、体格检查数据)的隐私性最强,医疗支付数据(医疗费用数据、支付方式数据、医疗保险数据)的隐私性其次,个人属性数据(个人身份数据、个人统计数据、个人联系方式数据)的隐私性最低.
Abstract
The results of data classification in medical industry standards are fuzzy,with few accompanying measure-ment results.Considering existing problems,this study adopted medical information text mining to objectively measure medical data privacy.Measurement results can provide a reference for verifying and improving current medical data clas-sification results.In this study,the sources of medically sensitive data included industry standards,legal regulations,aca-demic papers,and breach cases.The medically sensitive data unit is composed of sensitive nouns(also known as sensi-tive data items),sensitive verbs,and sensitive degree words,which are used in the privacy recognition model.The priva-cy measurement model considers the sensitivity,semantic strength,and text strength of sensitive data.In ranking the re-sults of privacy values,medical application data ranked the highest,followed by health status,medical payment,and per-sonal attribute data.
关键词
医疗信息文本/个人隐私/隐私数据识别/隐私计量Key words
medical information text/personal privacy/privacy data identification/privacy measurement引用本文复制引用
基金项目
国家社会科学基金重大项目(21&ZD338)
出版年
2024