基于图结构聚类的自监督学习疾病诊断方法

Self-Supervised Learning Based on Graph Structural Clustering for Disease Diagnosis Method

扫码查看

原文链接

国家科技期刊平台
NETL
NSTL
万方数据

中文摘要：图自监督学习方法近年来被应用于疾病诊断任务中以缓解医疗标签信息缺乏和人工标注问题.然而,图自监督学习的性能主要依赖于高质量的正样本和负样本,这限制了疾病诊断的灵活性和泛用性.此外,在构建医疗异构属性图时没有充分利用病人的多模态数据,影响了疾病诊断的性能.提出一个基于医疗异构属性图结构聚类的自监督学习疾病诊断框架SC4DD.该框架利用病人的结构化数据和非结构化临床文本摘要构建医疗异构属性图,通过图上的结构聚类算法生成节点的伪标签.考虑到不同元路径对学习病人嵌入表示的重要性以及不同模态医疗数据对疾病诊断结果的影响程度,引入注意力机制的异构图神经网络作为编码器,伪标签作为自监督信号辅助编码器学习注意力系数和病人嵌入表示.在MIMIC-III数据集上的实验结果表明,SC4DD优于传统基线方法,能够有效提高疾病诊断的性能.其中,相较于性能最优的基线方法HeCo,SC4DD在2%、3%、4%标记节点下的宏平均F1值分别提高了1.46%、0.97%、0.94%,微平均F1值分别提高了0.91%、0.84%、0.52%.

外文摘要：Recently,graph self-supervised learning has been applied to disease diagnosis to alleviate the lack of medical labeling information and manual labeling problems.However,the performance of existing graph self-supervised learning heavily relies on high-quality positive and negative samples,which limits the flexibility and generalizability of disease diagnosis.Moreover,the full potential of patients'multi-modal data is not adequately utilized in constructing medical heterogeneous attributed graphs,which affects the performance of disease diagnosis.Therefore,this study proposes a framework called self-supervised learning based on the Structural Clustering of a medical heterogeneous attributed graph for Disease Diagnosis(SC4DD).This framework uses medical structured data and unstructured medical text to construct a medical heterogeneous attributed graph,and generates pseudo-labels for nodes using a structural clustering algorithm on the graph.Considering the different levels of importance of the different meta-paths for learning patient representations and the different impacts of different model medical data on the diagnosis results,a heterogeneous Graph Neural Network(GNN)with an attention mechanism is introduced as an encoder.Pseudo-labels are used as self-supervised signals to assist the encoder in learning the attention coefficients and patient representations.Experimental results on the MIMIC-III dataset show that SC4DD outperforms other baselines and effectively improves the disease-diagnosis performance.In particular,compared to the optimal performance baseline method(HeCo),SC4DD achieves improvements of 1.46%,0.97%,and 0.94%in the Macro-F1 scores,along with improvements of 0.91%,0.84%,and 0.52%in the Micro-F1 scores,for 2%,3%,and 4%of labeled nodes.

外文关键词：

disease diagnosisElectronic Medical Records(EMR)graph self-supervised learningGraph Neural Network(GNN)medical heterogeneous attributed graph

作者：

张正康、杨丹、聂铁铮、寇月

展开 >

作者单位：

辽宁科技大学计算机与软件工程学院,辽宁鞍山 114051

东北大学计算机科学与工程学院,辽宁沈阳 110169

关键词：

疾病诊断电子病历图自监督学习图神经网络医疗异构属性图

基金：

国家自然科学基金国家自然科学基金辽宁省教育厅科学研究项目

项目编号：

6207208462072086LJKMZ20220646

出版年：

2024

DOI：

10.19678/j.issn.1000-3428.0068187

计算机工程

华东计算技术研究所　上海市计算机学会

计算机工程

CSTPCD北大核心

影响因子：0.581

ISSN：1000-3428

年,卷(期)：2024.50(7)