Construction of New Blood Lipid Features Based on Machine Learning and its Application in Coronary Atherosclerosis
Objective To analyze lipid profile and find a method that can integrate lipid profile using machine learning.Methods A total of 68 patients with coronary atherosclerosis admitted to our hospital from June 2021 to June 2022 were screened.Apolipoprotein B(ApoB),non-high-density lipoprotein cholesterol(N-HDL-C),low-density lipoprotein cholesterol(LDL-C),high-density lipoprotein cholesterol(HDL-C),total cholesterol(TC),triglyceride(TG),lipoprotein(a)Lp(a)data in the blood lipid profile of the patients were collected.The results of coronary angiography were reviewed,and the Gensini score of the patients was calculated by modified Gensini score.According to the relationship between the components in the blood lipid spectrum,an interpretable new feature-cholesterol index was constructed.The patients were randomly divided into training set and test set(3∶1).The random forest model was used to verify the predictive value of the constructed cholesterol index for severe coronary atherosclerosis by observing the area under the curve(AUC),f1 value,accuracy,recall rate and accuracy rate.Results A total of 68 patients with coronary atherosclerosis were collected,including 48 males and 20 females,with an average age of(57.96±11.33)years.There was no significant difference in age,TC,ApoB,N-HDL-C,LDL-C,HDL-C,TG,Lp(a)and cholesterol index between the training set and the test set(P>0.05).Using the original lipid profile,the AUC of the random forest model for predicting severe coronary atherosclerosis was 0.64(95%CL 0.41-0.80).The prediction effect of the random forest model was greatly improved using new feature cholesterol index=√ApoB×(LDL-C+0.1×(N-HDL-C-LDL-C))/HDL-C,and its AUC value was 0.84(95%CI:0.57-0.93),and f1 value,accuracy,recall rate,and accuracy are improved to varying degrees,which were 0.83,1.00,0.71,and 0.88,respectively.Conclusion Cholesterol index can effectively integrate cholesterol data and improve the prediction effect of random forest model on the severity of coronary atherosclerosis.