摘要
目的:通过机器学习算法对早发性卵巢功能不全(POI)的影响因素进行特征排序,找出对POI影响较大的因素.方法:先制定纳入和剔除标准,选取因月经不调就诊的500 例患者,根据中医证型进行年龄和职业差异性分析.再通过逻辑回归、支持向量机、决策树、随机森林、极端梯度提升和K-最近邻6 种机器学习算法对患者进行POI预测分类,根据算法求得的马修斯相关系数和AUC进行预测精准度比较.通过随机森林中的准确度和基尼不纯度下降对POI影响因素进行特征排序,结合逐步剔除法得到对POI影响程度排序前五的特征因素.结果:随机森林的算法在马修斯相关系数、准确率和AUC中均获得了最大值,分别为0.399、0.717 和0.908.POI的影响因素有子宫或盆腔手术史、受教育程度、年龄、减肥史和吸烟史,这些因素的Borda计数得分依次为手术史(2.446)、受教育程度(2.924)、年龄(4.060)、减肥史(5.303)、吸烟史(6.429).结论:随机森林的性能在预测POI患者中优于其他5 种算法,当患者的数据信息不足时,医生可先通过这5 个特征因素的指标对月经不调患者进行初步干预.
Abstract
Aim:To rank the influencing factors of premature ovarian insufficiency(POI)by machine learning algo-rithm,and find out the factors that have a greater impact on POI.Methods:Firstly,the inclusion and exclusion criteria were established,500 patients with abnormal menstruation were selected,and the corresponding age and occupation differences were analyzed according to the traditional Chinese medicine syndrome type.Then,6 machine learning algorithms including Logistic regression,support vector machine,decision tree,random forest,extreme gradient boosting and K-nearest neighbor were used to predict and classify POI,and the prediction accuracy was compared according to the Matthews correlation coef-ficient and AUC obtained by the algorithm.POI influencing factors were sorted through the accuracy and Gini impurity re-duction in random forest,and the top 5 factors were obtained by the stepwise elimination method.Results:Random forest al-gorithm obtained the maximum value in Matthews correlation coefficient,accuracy and AUC,which were 0.399,0.717 and 0.908,respectively.The influencing factors of POI were uterine or pelvic surgery history,education level,age,weight loss history and smoking history.The Borda count scores for the 5 factors were uterine or pelvic surgery history(2.446),educa-tion level(2.924),age(4.060),weight loss history(5.303),and smoking history(6.429).Conclusions:The performance of random forest algorithm is better than the other 5 algorithms in predicting POI.When the data information of patients is insufficient,doctors could preliminarily intervene patients with irregular menstruation through the indicators of these 5 char-acteristic factors.