首页|基于异构数据的患者术后非计划内再入院预测

基于异构数据的患者术后非计划内再入院预测

扫码查看
非计划内再入院是医院风险管理的重要信号,也是医疗质量的重要指标。目前,再入院预测已经成为医疗系统的一项重要任务,大量学者结合机器学习技术提出非常多有效的预测方法,但大多仅以单一结构数据为研究对象或仅使用串联方法融合异构数据。前者未能充分利用电子病历中丰富的数据与信息,后者则未能更好地融合异构数据的信息。基于上述问题,本文提出了一种基于CTFN异构数据融合方法,结合患者出院小结文本与住院期间产生的横断面数据预测患者再入院风险。预测模型的构建分为3个步骤。首先,利用RoBerta模型提取患者出院小结中的特征信息并得到表征矩阵;其次,使用CNN模型学习患者横断面特征信息,得到表征矩阵;最后,通过CTFN方法融合两个表征矩阵,得到异构数据的表征矩阵并通过线性层分类器得到最后的预测结果。CT-FN融合方法利用张量外积融合多个单模态表征矩阵,并增加CNN模型及残差结构设计加强异构数据模态内与模态间的信息学习。根据某公立医院的临床数据对上述方法进行验证,实验结果表明其表现出色,其中,召回率达到了76。1%,ROC曲线下面积达到了71。5%,均高于所对比的基线模型。证实了异构数据能提升分类器预测效果,且CTFN融合方法能够更好地融合异构数据间的信息,进一步提升分类器预测效果。
A Predictive Method for Unplanned Postoperative Readmission Risk Based on Heterogeneous Data
Objective Unplanned readmissions are a critical indicator for hospital risk management and a key measure of medical quality.Predicting readmis-sions poses a significant challenge within healthcare systems.While numerous existing methods have demonstrated remarkable results,there is still considerable room for further improvement.For instance,many studies have focused on a single structured data set or have used basic concat-enation techniques to integrate heterogeneous data resources.These approaches do not fully utilize the extensive data available in electronic med-ical records and are often ineffective at integrating heterogeneous data resources.To address these issues,it is proposed to use machine learning technology combined with patient clinical heterogeneous data to develop a predictive model for unplanned postoperative readmission.Methods This model will identify patients at high risk of readmission at an early stage,enabling timely intervention.Such timely intervention will reduce medical financial and public medical resource wastage and alleviate the medical burden on patients.The discharge summary provides a comprehensive account of the entire process,from admission to discharge,and is the most crucial written document for describing the circum-stances of the hospitalization.The Patient cross-sectional data,which includes three main categories:patient basic information,data generated during the patient's hospital stay,and the patient's past medical records,is strongly correlated with the likelihood of patients being readmitted to the hospital.This correlation has been corroborated by numerous scholars in experimental studies.By integrating these key patient clinical data sets with data mining,machine learning and other analytical techniques,it is possible to construct a predictive model for unplanned postoperative readmission.This model effectively identifies patients at high risk of readmission.This paper addresses two principal areas of enquiry,namely:1)Address the issues arising from the absence of data and uneven distribution.The paper's main goal is to conduct an objective analysis of the pa-tient's discharge summaries and cross-sectional data,identifying their characteristics and deficiencies.To address data loss and heterogeneous data structures,various approaches have been utilized.For missing values in cross-sectional data,a post-sample classification filling technique is employed.This involves filling discrete features of positive patients with the median and continuous features with the mean of that feature,with the same method applied to negative patients.Additionally,the issue of uneven distribution in patient cross-sectional data is resolved using the category weighting method.For text data,BCEWithLogit Loss is employed to counteract uneven sample distribution.2)Propose a method based on CTFN for heterogeneous data fusion.The method begins by using the RoBerta model and a CNN model to extract representation matrices from the patient's discharge summary text and cross-sectional data,respectively.These matrices capture the essential features of the data in a format suitable for fusion.The CTFN heterogeneous data fusion network is then used to fuse these representation matrices,obtaining comprehensive het-erogeneous representation matrix.The CTFN is designed to build upon the TFN by directly calculating the tensor product to fuse the heterogen-eous representation data.This method allows for the fusion of data without the need to expand the dimension of the unimodal representation mat-rix,thus preserving intra-modal information.To further enhance the fused representation,a CNN model is used to convolve the fused representa-tion,amplifying important features.However,this process may result in the loss of intra-modal information due to the abandonment of single-modal representation dimension expansion before computing the tensor product of heterogeneous data representation matrices.To address this,a residual design is used.The fusion representation matrix is fused with the single-modality representation matrices of both discharge summary and cross-sectional data.These fused representation matrices are then concatenated to form the final heterogeneous data representation matrix.This approach compensates for the missing intra-modal information and enhances both intra-modal and inter-modal learning.The result is a more com-prehensive expression of information in heterogeneous data,which ultimately improves the model's performance.Results and Discussions The primary contribution is to propose the use of heterogeneous data in patients'electronic medical records to compre-hensively determine the likelihood of readmission,and to provide an effective methodology for the integration of heterogeneous data information.The CTFN method is capable of effectively integrating data from two different structures based on deep learning technology,thereby facilitating the acquisition of favorable prediction outcomes.The CTFN method was verified on clinical data from public hospitals,and the experimental res-ults demonstrated the effectiveness of numerous improvements proposed in CTFN.Furthermore,this method exhibits the most optimal perform-ance among all the compared models,with a markedly superior predictive efficacy.The recall rate is 76.1%and the AUC is 71.5%,both of which are superior to those of the baseline model under comparison.The method should satisfy the recall rate and accuracy requirement of real-world use,and provide technical support to assist doctors in determining the readmission goals of high-risk patients.The accuracy of using heterogen-eous data to predict the risk of unplanned readmission of patients following surgery is significantly enhanced when compared to the use of a single structured data set.Conclusions This study proposes a predictive method based on CTFN for the fusion of heterogeneous data in the context of unplanned postoper-ative readmission.It has been demonstrated that in the context of predicting readmission for postoperative diseases,the incorporation of a greater number of highly relevant clinical data sets with varying structures and formats can effectively enhance the predictive effect of the model.Addi-tionally,CTFN exhibits superior capabilities in heterogeneous data fusion compared with existing methods.The representation matrix of hetero-geneous data generated through CTFN is more effective at representing contained within heterogeneous data,thereby enabling the model to make more accurate judgments based on a greater quantity of data.The risk of readmission is closely linked to a patient's medical history,hospitaliza-tion recovery,and post-discharge social activities.The current experiment utilized a subset of medical records and focused on in-hospital recov-ery,which could be expanded to include more comprehensive data.Incorporating patients'post-discharge social activities could further enhance the model's predictive capability.Additionally,the model's predictive performance could be improved by integrating medical imaging data,such as CT scans and X-rays,once they are properly fused with other data types using the CTFN approach.

heterogeneous datadeep learningtensor fusionreadmissionconvolutional networksresidual structure

俞凯、董小锋、袁贞明、崔朝健、罗伟斌

展开 >

杭州师范大学 信息科学与技术学院,浙江 杭州 310016

异构数据 深度学习 张量融合 再入院 卷积网络 残差结构

2025

工程科学与技术
四川大学

工程科学与技术

北大核心
影响因子:0.913
ISSN:2096-3246
年,卷(期):2025.57(1)