Construction and Verification of Pulmonary Nodules Invasion Prediction Model Based on XGBoost Machine Learning Algorithm:A Two-center Study
Objective To construct a clinical radiomics model using XGBoost machine learning algorithm to predict the pathological invasion of pulmonary nodules,and to validate the model generically in an external cohort.Methods 248 patients with isolated pulmonary nodules diagnosed by CT were retrospectively included,and the radiological features of the pulmonary nodules and the surrounding 3mm and 5mm areas were extracted respectively.After feature selection from coarse-to-fine,Radscore is calculated using the least absolute shrinkage and selection operator(LASSO)logistic regression.Univariate and multivariate logistic regression analyses were used to determine the clinical radiological factors associated with pulmonary nodules invasion.A joint clinical-radiomics model was then constructed using Logistic and XGBoost algorithms,and the generalization of the model was evaluated in an independent external validation cohort(n=147).Results The clinical radiology XGBoost combined model with Radscore,CT value,lung nodule length and lunate sign was superior to the radiomic model and Logistic combined model of clinical radiology in predicting pulmonary nodules invasion.The area under the curve(AUC)in the training cohort was 0.889(95%CI,0.848~0.927),and the AUC in the external validation cohort was 0.889(95%CI,0.823~0.942),showing satisfactory predictive efficacy.Conclusion We used the XGBoost machine learning algorithm to construct a clinical radiomics model for predicting pulmonary nodules invasion.The results showed satisfactory predictive efficacy and were well generalized in an independent external validation group,which can help clinicians guide the diagnosis and treatment of pulmonary nodules and develop evaluation strategies.