Machine Learning Model Based on Clinical,CT Spectral and CT Radiomics Features for Predicting KRAS Gene Status in Colorectal Cancer Patients Before Surgery
Objective To explore the application value of different machine learning models to predict preoperative KRAS gene status in patients with colorectal cancer based on clinical,CT spectral and CT radiomics features.Methods From June 2020 to December 2023,a retrospective study was performed for the two hundred and four patients with colorectal adenocarcinoma through pathology confrmed in North China University of Science and Technology Affiliated Hospital.Based on KRAS gene test results,these cases were divided into the KRAS wild type(n=87)and KRAS mutant type(n=117)groups.The regions of interest of colorectal cancer were drawn on the venous enhancement thin images,and all radiomics features were further extracted.Randomly divided into the training group and the test group at a ratio of 7∶3,and the least absolute shrinkage and selection operator(LASSO)was used to screen the radiomics features.Support vector machine(SVM),eXtreme Gradient Boosting(XGBoost)and Logistic regression(LR)were constructed to predict KRAS gene sub-type in colorectal cancer patients before surgery(a total of 6,SVM model,XGBoost model and LR model were constructed from the pure radiomics features;SVM model,XGBoost model and LR model were constructed from the combination of clin-ical,CT spectral and CT radiomics features).The receiver operating characteristic(ROC)curve was drawn,and the area under the curve(AUC)was calculated to evaluate the effectiveness of each model for predicting the KRAS gene subtype of colorectal cancer.Delong test was used to compare the effectiveness among 6 models.The clinical application value of the three machine learning models based on the combination of clinical,CT spectral and CT radiomics features were evaluated with decision curve analysis(DCA).Results The differences in the Iodine concentration(IC),Normalized iodine con-centration(NIC)and Effective-Z(Eff-Z)of the venous phase energy spectral parameters were statistically significant be-tween the wild-type KRAS and mutant KRAS groups(P<0.05);The two groups showed no significant difference in clini-cal parameters including age,sex,and biochemistry serum markers(P>0.05).Comparing with the pure radiomics data,it can be seen that the addition of clinical parameters further improveed the predictive efficiency of the model.The AUC values of the SVM model constructed from the pure CT radiomics data and the combination of CT spectral and CT radiomics fea-tures were 0.810 and 0.866,respectively;The accuracy were 0.758 and 0.790 respectively.The AUC values of the XG-Boost model constructed from the pure CT radiomics data and the combination of CT spectral and CT radiomics features were 0.804 and 0.918,respectively.The accuracy were 0.790 and 0.855 respectively.The AUC values of the LR model con-structed from the pure CT radiomics data and the combination of CT spectral and CT radiomics features were 0.827 and 0.910,respectively.The accuracy were 0.774 and 0.806 respectively.Wherein,the AUC,Accuracy,Sensitivity and Speci-ficity of the XGBoost model constructed from the combination of CT spectral and CT radiomics features reached the optimal level,the differences in AUC values were statistically significant by Delong test(P<0.05).The DCA showed the XGBoost model had the highest net benefit and a wider range of threshold probabilities when the risk threshold was 25%-98%.Conclusion Multiple machine learning models based on the combination of CT spectral(at venous phase)and CT ra-diomics features can effectively evaluate KRAS gene status in colorectal cancer patients before surgery,the XGBoost algo-rithm has the best performance.