首页|基于机器学习构建肉芽肿性小叶性乳腺炎肿块期术后复发风险预测模型

基于机器学习构建肉芽肿性小叶性乳腺炎肿块期术后复发风险预测模型

扫码查看
目的 利用机器学习算法预测影响肉芽肿性小叶性乳腺炎(granulomatous lobular mastitis,GLM)肿块期术后复发的风险因素,为GLM肿块期术后复发的早期识别和预防提供参考.方法 选取2020年10月至2023年1月期间于河南中医药大学第一附属医院乳腺病科住院手术治疗且组织病理学检查诊断为GLM患者的电子病例及随访资料,符合纳排标准的340例GLM肿块期术后患者作为研究对象.根据患者术后是否出现复发,分为复发组和非复发组.把纳入研究的病例按照7∶3比例随机分为训练集和测试集,在训练集中分别利用传统逻辑(logistic)回归和3种机器学习算法即人工神经网络、随机森林和极限梯度提升算法(extrem gradient boosting,XGBoost),构建复发预测模型.在测试集中,通过敏感度、特异度、准确度、阳性预测值、阴性预测值、F1值和曲线下面积(area under curve,AUC)值评价模型的预测效能,并通过Shapley Additive exPlanation(SHAP)方法探讨影响最优模型识别GLM肿块期术后复发的重要变量,确定预测模型的最佳风险截断值,据此将外部测试集GLM肿块期术后患者分为高、低风险组.结果 纳入符合GLM肿块期术后患者共392例,根据排除标准剔除52例,最终纳入340例,其中复发组60例,未复发组280例.基于单因素分析结果、相关性分析结果以及对临床有意义的影响因素,筛选出12个非零系数的特征变量用于构建预测模型,12个特征变量包括其他疾病史、流产次数、患侧乳房哺乳时长、乳汁淤积史、病变部位、乳头凹陷程度、波动感、低密度脂蛋白、睾酮、既往抗生素治疗史、既往口服激素药物史和围手术期中药治疗时长.分别构建logistic回归预测模型、人工神经网络、随机森林和XGBoost预测模型,结果显示4种预测模型的准确度、阳性预测值、阴性预测值均>75%,其中XGBoost模型性能最优,其准确率、特异度、敏感度、AUC、阳性预测值、阴性预测值和F1值分别为0.93、0.99、0.65、0.87、0.92、0.93和0.76;SHAP方法发现围手术期中药治疗时长、患侧乳房哺乳时长、低密度脂蛋白、睾酮和既往服用激素药物是影响XGBoost模型识别GLM肿块期术后复发排名前5的因素.结论 与传统的logistic回归预测模型相比,基于机器学习识别GLM肿块期术后复发的模型均表现出较优性能,其中XGBoost模型表现最佳,临床可基于上述危险因素给予针对性预防措施以改善GLM肿块期术后预后.
Construction of a prediction model for postoperative recurrence of granulomatous mastitis in the mass stage based on machine learning
Objective To predict the risk factors affecting postoperative recurrence of granulomatous lobular mastitis(GLM)in the mass stage by machine learning algorithm,and to provide a reference for the early identification and prevention of postoperative recurrence of GLM in the mass stage.Methods The electronic medical records and follow-up data of patients with GLM in the Department of Breast Disease Unit,the First Affiliated Hospital of Henan University of Traditional Chinese Medicine from October 2020 to January 2023 were selected.A total of 340 patients with GLM in the mass stage who met the inclusion and exclusion criteria were selected as the research subjects.According to whether the patients relapsed after surgery,they were divided into recurrence group and non-recurrence group.The collected cases were randomly divided into training set and test set according to the ratio of 7∶3.In the training set,the recurrence prediction model was constructed by using traditional logistic regression and three machine learning algorithms:artificial neural network,random forest and XGBoost(extrem gradient boosting).In the test set,the performance of the model was evaluated by sensitivity,specificity,accuracy,positive predictive value,negative predictive value,Fl value and area under the curve(AUC)value.The Shapley Additive exPlanation(SHAP)method was used to explore the important variables that affect the optimal model in identifying postoperative recurrence in the GLM mass phase.The optimal risk cutoff value of the prediction model was determined by the Youden index.Based on this,the postoperative patients in the GLM mass phase of the external test set were divided into high-risk and low-risk groups.Results A total of 392 patients who met the GLM mass stage were included,and 52 cases were excluded according to the exclusion criteria,and 340 cases were finally included,including 60 cases in the recurrence group and 280 cases in the non-recurrence group.Based on the results of univariate analysis,correlation analysis and clinically meaningful influencing factors,12 non-zero coefficient characteristic variables were screened for the construction of the prediction model,and these 12 characteristic variables included other disease history,number of miscarriages,breastfeeding duration of the affected breast,history of milk stasis,lesion location,nipple indentation,fluctuation sensation,low-density lipoprotein,testosterone,previous antibiotic therapy,previous oral hormone medication,and perioperative traditional Chinese medicine treatment duration.The logistic regression prediction model,artificial neural network,random forest and XGBoost prediction models were constructed,and the results showed that the accuracy,positive predictive value and negative predictive value of the four prediction models were all>75%,among which the XGBoost model had the best performance,with accuracy,specificity,sensitivity,AUC,positive predictive value,negative predictive value and F1 values of 0.93,0.99,0.65,0.87,0.92,0.93 and 0.76,respectively.SHAP method found that the duration of traditional Chinese medicine treatment during perioperative period,the duration of breast-feeding on the affected side,low density lipoprotein,testosterone and previous hormone drugs were the top five factors affecting XGBoost model to identify postoperative recurrence of GLM in mass stage.Conclusions Compared with the traditional Logistic regression prediction model,the models based on machine learning for identifying postoperative recurrence in the GLM mass phase showed better performance,among which the XGBoost model performed best.Targeted preventive measures can be given based on the above risk factors to improve the postoperative prognosis of the GLM mass phase.

granulomatous lobular mastitismass stagepostoperative recurrencemachine learningpredictive model

徐月圆、程旭锋、刘琪、程梓烨、孟冰心

展开 >

河南中医药大学第一附属医院乳腺病科(郑州 450000)

河南中医药大学第一临床医学院(郑州 450046)

肉芽肿性小叶性乳腺炎 肿块期 术后复发 机器学习 预测模型

2024

中国普外基础与临床杂志
四川大学华西医院

中国普外基础与临床杂志

CSTPCD
影响因子:0.858
ISSN:1007-9424
年,卷(期):2024.31(12)