Analysis of influencing factors of premature ovarian insufficiency based on 6 machine learning algorithms
Aim:To rank the influencing factors of premature ovarian insufficiency(POI)by machine learning algo-rithm,and find out the factors that have a greater impact on POI.Methods:Firstly,the inclusion and exclusion criteria were established,500 patients with abnormal menstruation were selected,and the corresponding age and occupation differences were analyzed according to the traditional Chinese medicine syndrome type.Then,6 machine learning algorithms including Logistic regression,support vector machine,decision tree,random forest,extreme gradient boosting and K-nearest neighbor were used to predict and classify POI,and the prediction accuracy was compared according to the Matthews correlation coef-ficient and AUC obtained by the algorithm.POI influencing factors were sorted through the accuracy and Gini impurity re-duction in random forest,and the top 5 factors were obtained by the stepwise elimination method.Results:Random forest al-gorithm obtained the maximum value in Matthews correlation coefficient,accuracy and AUC,which were 0.399,0.717 and 0.908,respectively.The influencing factors of POI were uterine or pelvic surgery history,education level,age,weight loss history and smoking history.The Borda count scores for the 5 factors were uterine or pelvic surgery history(2.446),educa-tion level(2.924),age(4.060),weight loss history(5.303),and smoking history(6.429).Conclusions:The performance of random forest algorithm is better than the other 5 algorithms in predicting POI.When the data information of patients is insufficient,doctors could preliminarily intervene patients with irregular menstruation through the indicators of these 5 char-acteristic factors.