基于双模态乳腺影像特征的HER2表达状态多分类机器学习预测模型构建及可解释性分析研究

Multi-Class Machine Learning Predictive Model for HER2 Expression Status based on Dual-Modal Breast Imaging Features

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：目的基于乳腺X线及超声双模态影像特征构建多分类机器学习模型,术前预测乳腺癌患者的人类表皮生长因子受体2(HER2)表达状态.方法搜集符合纳排标准的632例女性乳腺癌患者,其中HER2不表达141例、低表达311例、高表达180例.提取研究对象全视野数字化乳腺X线摄影(FFDM)、数字乳腺断层摄影(DBT)以及乳腺超声(US)图像上乳腺癌病灶的影像学征象.在完成数据预处理后,基于五种机器学习算法构建乳腺癌HER2表达状态的三分类预测模型.采用五折交叉验证训练机器学习模型,计算三分类模型的总体准确率、精确率、查全率、F1-score、受试者工作特征曲线(ROC)曲线下面积(AUC)值.通过Bootstrap法比较模型间AUC值,筛选出分类性能最优的三分类模型.采用SHAP方法评估每个特征对模型预测HER2表达状态的重要性.结果在测试集上,随机森林(RF)模型对HER2表达状态的三分类性能最优,宏平均AUC为0.723,微平均AUC为0.783,总体准确率为57.4％,宏平均召回率为53.0％.SHAP分析结果表明,影响RF模型输出最重要的五个全局特征依次为X线观察到的钙化、舒张压、细线样或线样分支状钙化、US测量的病灶最大径、CA153.结论基于乳腺X线及超声双模态影像特征的机器学习多分类模型可以实现乳腺癌患者HER2表达状态的术前预测,结合SHAP方法可以提高机器学习模型的可解释性.

外文摘要：Objective The aim of this study is to construct multi-class machine learning models for preoperatively pre-dicting the HER2 expression status of breast cancer patients using mammography and ultrasound dual-modal imaging fea-tures.Methods A total of 632 female breast cancer patients meeting the inclusion criteria were collected,including 141 cases with non-expression of HER2,311 cases with low expression,and 180 cases with high expression.Extract imaging fea-tures of breast cancer lesions from FFDM,DBT,and US images of the study subjects.After completing data preprocess,a three-class prediction model for the HER2 expression status in breast cancer was constructed based on five machine learning algorithms.The machine learning models were trained using five-fold cross-validation,and the overall accuracy,precision,re-call,F1-score,and the AUC were calculated for the three-class models.The Bootstrap method was used to compare the AUCs between models in order to select the optimal three-class classification model with the best performance.The SHAP method was employed to assess the importance of each feature in predicting the HER2 expression status.Results In the test set,the RF model demonstrated the best performance in the three-class classification of HER2 expression status.The macro-av-erage AUC was 0.723,the micro-average AUC was 0.783,and the overall accuracy was 57.4％.The macro-average recall rate was 53.0％.SHAP analysis revealed that the five most important global features influencing theoutput of the RF model were,in descending order,calcifications observed on X-ray,diastolic blood pressure,linear or branching calcifications,maxi-mum lesion diameter measured by US,and CA153.Conclusion The machine learning multi-class prediction model based onmammographic and ultrasound dual-modal imaging features has the ability to predict the HER2 status of breast cancer pa-tients before operation.Combining with SHAP values can enhance the interpretability of the machine learning model.

外文关键词：

Breast CancerHER2 StatusMammographyUltrasoundMachine Learning

作者：

舒予静、何子龙、曾伟雄、刘家玲、郭丫丫、陈卫国

展开 >

作者单位：

510515 广州,南方医科大学南方医院放射科

523710 东莞,广东医科大学附属东莞第一医院放射影像中心

关键词：

乳腺癌 HER2表达状态乳腺X线摄影超声机器学习

基金：

国家自然科学基金项目

项目编号：

82171929

出版年：

2024

临床放射学杂志

黄石市医学科技情报所

临床放射学杂志

CSTPCD北大核心

影响因子：0.872

ISSN：1001-9324

年,卷(期)：2024.43(8)

参考文献量8