Development and validation of a random forest model based on 18F-FDG PET for predicting pathological parameters of early invasive lung adenocarcinoma
Objective:To evaluate the value of positron emission tomography(PET)-based random forest model for predicting pathological parameters in patients with clinical stage IA non-small cell lung cancer.Methods:Clinical,pathological and imaging data of totally 295 lung cancer patients who underwent the preoperative fluorine-18-fludeoxyglucose(18F-FDG)from August 2020 to July 2022 at the First Affiliated Hospital of Wenzhou Medical University were retrospectively analyzed,including 158 cases of positive pathological signs and 137 cases of negative pathological signs.Patients were randomly divided as a training cohort(207 cases)and a validation cohort(88 cases)in a 7:3 ratio.PET-based radiomics features were extracted from the gross tumor volume and gross tumor volume incorporating peritumoral 5 mm regions.The model established and validated by using random forest algorithm.The AUC,sensitivity,specificity and accuracy were used to evaluate the diagnostic performance of the model.Results:Logistic regression analysis showed that the SUVmax value had statistical difference between the pathology-positive group and the pathology-negative group in the training set(both P<0.05),with the AUC of the clinical model based on this being 0.72 in the training set and 0.71 in the validation set.The models for the tumor group and the peritumoral group achieved AUCs of 0.81 and 0.79 on the training set,and 0.82 and 0.79 on the test set,respectively.The tumor group and peritumoral group were selected through LASSO regression to obtain 9 significantly correlated features.Based on these features,four machine learning models were established,including decision number(DT),support vector machine(SVM),random forest(RF),and k-nearest neighbor(kNN).The model established by RF(AUC=0.91,0.88)outperformed DT(AUC=0.73,0.76),SVM(AUC=0.69,0.86),and kNN(AUC=0.80,0.81)models in both training and validation sets.Therefore,the RF model was chosen as the optimal imaging omics model.Compared with gross tumor model and gross tumor volume incorporating peritumoral 5 mm region,the comprehensive model contained tumor and peritumoral factor showed promising performance with AUC of 0.91 in the training cohort and 0.88 in the validation cohort respectively.The differences between the tumor group,peritumoral group,and clinical group models were statistically significant(P<0.05).Conclusion:The predicative model based on machine learning can provide a novel tool for predicting pathological parameters of lung adenocarcinoma patients,which contributes to the precise diagnosis and preoperative treatment in clinical decision-making.
adenocarcinoma of lungperitumormachine learningpathologypositron emission computed tomography