A Predictive Method for Unplanned Postoperative Readmission Risk Based on Heterogeneous Data
Objective Unplanned readmissions are a critical indicator for hospital risk management and a key measure of medical quality.Predicting readmis-sions poses a significant challenge within healthcare systems.While numerous existing methods have demonstrated remarkable results,there is still considerable room for further improvement.For instance,many studies have focused on a single structured data set or have used basic concat-enation techniques to integrate heterogeneous data resources.These approaches do not fully utilize the extensive data available in electronic med-ical records and are often ineffective at integrating heterogeneous data resources.To address these issues,it is proposed to use machine learning technology combined with patient clinical heterogeneous data to develop a predictive model for unplanned postoperative readmission.Methods This model will identify patients at high risk of readmission at an early stage,enabling timely intervention.Such timely intervention will reduce medical financial and public medical resource wastage and alleviate the medical burden on patients.The discharge summary provides a comprehensive account of the entire process,from admission to discharge,and is the most crucial written document for describing the circum-stances of the hospitalization.The Patient cross-sectional data,which includes three main categories:patient basic information,data generated during the patient's hospital stay,and the patient's past medical records,is strongly correlated with the likelihood of patients being readmitted to the hospital.This correlation has been corroborated by numerous scholars in experimental studies.By integrating these key patient clinical data sets with data mining,machine learning and other analytical techniques,it is possible to construct a predictive model for unplanned postoperative readmission.This model effectively identifies patients at high risk of readmission.This paper addresses two principal areas of enquiry,namely:1)Address the issues arising from the absence of data and uneven distribution.The paper's main goal is to conduct an objective analysis of the pa-tient's discharge summaries and cross-sectional data,identifying their characteristics and deficiencies.To address data loss and heterogeneous data structures,various approaches have been utilized.For missing values in cross-sectional data,a post-sample classification filling technique is employed.This involves filling discrete features of positive patients with the median and continuous features with the mean of that feature,with the same method applied to negative patients.Additionally,the issue of uneven distribution in patient cross-sectional data is resolved using the category weighting method.For text data,BCEWithLogit Loss is employed to counteract uneven sample distribution.2)Propose a method based on CTFN for heterogeneous data fusion.The method begins by using the RoBerta model and a CNN model to extract representation matrices from the patient's discharge summary text and cross-sectional data,respectively.These matrices capture the essential features of the data in a format suitable for fusion.The CTFN heterogeneous data fusion network is then used to fuse these representation matrices,obtaining comprehensive het-erogeneous representation matrix.The CTFN is designed to build upon the TFN by directly calculating the tensor product to fuse the heterogen-eous representation data.This method allows for the fusion of data without the need to expand the dimension of the unimodal representation mat-rix,thus preserving intra-modal information.To further enhance the fused representation,a CNN model is used to convolve the fused representa-tion,amplifying important features.However,this process may result in the loss of intra-modal information due to the abandonment of single-modal representation dimension expansion before computing the tensor product of heterogeneous data representation matrices.To address this,a residual design is used.The fusion representation matrix is fused with the single-modality representation matrices of both discharge summary and cross-sectional data.These fused representation matrices are then concatenated to form the final heterogeneous data representation matrix.This approach compensates for the missing intra-modal information and enhances both intra-modal and inter-modal learning.The result is a more com-prehensive expression of information in heterogeneous data,which ultimately improves the model's performance.Results and Discussions The primary contribution is to propose the use of heterogeneous data in patients'electronic medical records to compre-hensively determine the likelihood of readmission,and to provide an effective methodology for the integration of heterogeneous data information.The CTFN method is capable of effectively integrating data from two different structures based on deep learning technology,thereby facilitating the acquisition of favorable prediction outcomes.The CTFN method was verified on clinical data from public hospitals,and the experimental res-ults demonstrated the effectiveness of numerous improvements proposed in CTFN.Furthermore,this method exhibits the most optimal perform-ance among all the compared models,with a markedly superior predictive efficacy.The recall rate is 76.1%and the AUC is 71.5%,both of which are superior to those of the baseline model under comparison.The method should satisfy the recall rate and accuracy requirement of real-world use,and provide technical support to assist doctors in determining the readmission goals of high-risk patients.The accuracy of using heterogen-eous data to predict the risk of unplanned readmission of patients following surgery is significantly enhanced when compared to the use of a single structured data set.Conclusions This study proposes a predictive method based on CTFN for the fusion of heterogeneous data in the context of unplanned postoper-ative readmission.It has been demonstrated that in the context of predicting readmission for postoperative diseases,the incorporation of a greater number of highly relevant clinical data sets with varying structures and formats can effectively enhance the predictive effect of the model.Addi-tionally,CTFN exhibits superior capabilities in heterogeneous data fusion compared with existing methods.The representation matrix of hetero-geneous data generated through CTFN is more effective at representing contained within heterogeneous data,thereby enabling the model to make more accurate judgments based on a greater quantity of data.The risk of readmission is closely linked to a patient's medical history,hospitaliza-tion recovery,and post-discharge social activities.The current experiment utilized a subset of medical records and focused on in-hospital recov-ery,which could be expanded to include more comprehensive data.Incorporating patients'post-discharge social activities could further enhance the model's predictive capability.Additionally,the model's predictive performance could be improved by integrating medical imaging data,such as CT scans and X-rays,once they are properly fused with other data types using the CTFN approach.