Self-Supervised Learning Based on Graph Structural Clustering for Disease Diagnosis Method
Recently,graph self-supervised learning has been applied to disease diagnosis to alleviate the lack of medical labeling information and manual labeling problems.However,the performance of existing graph self-supervised learning heavily relies on high-quality positive and negative samples,which limits the flexibility and generalizability of disease diagnosis.Moreover,the full potential of patients'multi-modal data is not adequately utilized in constructing medical heterogeneous attributed graphs,which affects the performance of disease diagnosis.Therefore,this study proposes a framework called self-supervised learning based on the Structural Clustering of a medical heterogeneous attributed graph for Disease Diagnosis(SC4DD).This framework uses medical structured data and unstructured medical text to construct a medical heterogeneous attributed graph,and generates pseudo-labels for nodes using a structural clustering algorithm on the graph.Considering the different levels of importance of the different meta-paths for learning patient representations and the different impacts of different model medical data on the diagnosis results,a heterogeneous Graph Neural Network(GNN)with an attention mechanism is introduced as an encoder.Pseudo-labels are used as self-supervised signals to assist the encoder in learning the attention coefficients and patient representations.Experimental results on the MIMIC-III dataset show that SC4DD outperforms other baselines and effectively improves the disease-diagnosis performance.In particular,compared to the optimal performance baseline method(HeCo),SC4DD achieves improvements of 1.46%,0.97%,and 0.94%in the Macro-F1 scores,along with improvements of 0.91%,0.84%,and 0.52%in the Micro-F1 scores,for 2%,3%,and 4%of labeled nodes.