A Machine Learning-Based Approach Identifying the Value of Clinical and Enhanced CT Imaging Histological Features for the Diagnosis of Benign and Malignant Pulmonary Nodules
Objective To identify the value of clinical histological features and the enhanced CT imaging features in the di-agnosis of pulmonary nodules by machine learning.Methods A retrospective study was conducted for 89 patients with pul-monary nodules confirmed by surgical specimens in the Xiangyang No.1 People's Hospital from June 2018 to July 2022,in-cluding 37 patients with benign nodules and 52 patients with malignant nodules.The patients were classified into a training set and a validation set at a ratio of 8 ∶ 2.The regions of interest(ROI)in the lesion was extracted during the plain scan,arterial phase and venous phase.The imaging features were extracted by software and screened by univariate analysis,mult-ivariate analysis and least absolute shrinkage and selection operator.Furthermore,machine learning methods were employed to establish the model for predicting the benign or malignant nodules,and the diagram of weighted SHapley Additive exPla-nation(SHAP)values was established.Finally,the decision curve analysis(DCA)was employed to analyze the patient benefits.Results The training set and validation set included 72 and 17 patients,respectively.A total of 2800 imaging fea-tures and 31 clinical features were extracted.After screening,13 imaging features and 4 clinical features were retained.The clinical features of history of underlying lung diseases and cytokeratin 19 fragment antigen 21-1(CYFRA 21-1),as well as the imaging features of burr sign and lobar sign,showed significant differences between the benign and malignant groups(P<0.05).Among the various machine learning methods,XGBoost demonstrated the highest performance.The DCA results indicated good patient benefits.Conclusion The XGBoost model,based on enhanced CT and tumor markers,is of great value in identifying the nature of pulmonary nodules.