摘要
目的:探究乳腺磁共振成像(magnetic resonance imaging,MRI)影像学特征与21基因检测复发风险评分(recurrence score,RS)的相关性,并建立RS预测模型.方法:收集2017年4月—2019年3月在复旦大学附属肿瘤医院进行21基因检测的雌激素受体(estrogen receptor,ER)阳性、人表皮生长因子受体2(human epidermal growth factor receptor 2,HER2)阴性乳腺癌患者的资料,筛选出有术前MRI检查的患者拟入组.以RS=26分为临界值,将患者分为高危组(RS≥26)与低危组(RS<26).根据2013版乳腺影像报告和数据系统标准评估患者图像.运用单因素检验比较MRI影像学特征在RS分组间差异,运用多元logistic回归构建RS预测模型.以7∶3比例将患者分为训练组和验证组,使用Pearson相关系数筛选法和递归特征消除法进行特征筛选,运用合成少数类过采样技术法进行重采样,使用4种不同机器学习模型算法构建模型(线性支持向量机、随机森林、决策树和K近邻).运用受试者工作特征(receiver operating characteristic,ROC)曲线评估模型效能.结果:共入组159例患者(低危组58例,高危组101例).在临床病理学特征中,孕激素受体(progesterone receptor,PR)表达状态组间差异有统计学意义(P=0.017),低危组PR表达阳性患者占比更高.在MRI影像学特征中,肿块边缘组间差异有统计学意义(P=0.008),低危组肿块多表现为边缘毛刺(64.8%),高危组肿块多表现为边缘不规则(54.7%).将PR状态和肿块边缘纳入多因素logistic回归模型,PR阳性与PR阴性相比,PR阳性患者复发风险相对低,OR值为0.110(P=0.038);边缘毛刺的肿块与边缘不规则肿块相比,边缘毛刺的肿块复发风险相对低,OR值为0.343(P=0.004).Logistic回归模型曲线下面积(area under curve,AUC)为0.67,且该模型校准性能良好且具有一定临床实用性.以7∶3划分后,训练组纳入111例患者(低危组34例,高危组77例),验证组纳入48例患者(低危组和高危组均为24例).4种机器学习模型AUC为0.64~0.69,支持向量机和随机森林模型预测效能相对较高.结论:MRI在评估ER+/HER-乳腺癌患者复发风险方面具有潜在价值.
Abstract
Objective:To explore the association between magnetic resonance imaging(MRI)features and the 21-gene recurrence score(RS),and to establish RS prediction models.Methods:Clinical and imaging data of estrogen receptor(ER)+/human epidermal growth factor receptor 2(HER2)-breast cancer patients who underwent 21-gene expression assay in Fudan University Shanghai Cancer Center from April 2017 to March 2019 were collected.The patients who underwent preoperative breast MRI were selected.MRI images were evaluated according to the 2013 version of Breast Imaging Reporting and Data System lexicon.The univariate analyses were used to compare differences in MRI imaging features between the high-risk group(RS≥26)and the low-risk group(RS<26)and multivariate logistic regression was used to construct a model.The patients were divided into the training group and the validation group in a 7∶3 ratio.Pearson correlation coefficient screening method and recursive feature elimination method were used for feature screening,the synthetic minority oversampling technique was used for balancing the training dataset,four different machine learning algorithms(linear support vector machine,random forest,decision tree and K-nearest neighbor)were used to construct the models,and the model performance was evaluated by receiver operating characteristic(ROC)curve.Results:A total of 159 patients were enrolled,with 58 in the low-risk group and 101 in the high-risk group.In clinical characteristics,progesterone receptor(PR)status showed difference(P=0.017),and the proportion of patients with positive PR expression was higher in the low-risk group than in the high-risk group.In the MRI characteristics,the distribution of tumor margins showed difference between groups(P=0.008).The tumors in the low-risk group were mostly characterized by spiculated margins(64.8%),and the tumors in the high-risk group were mostly characterized by irregular margins(54.7%).Incorporating PR status and tumor margin into the multivariate logistic regression model,PR positive patients had a lower risk of recurrence than PR negative patients,with an OR value of 0.110(P=0.038);the recurrence risk of spiculated margins was relatively lower than that of irregular margins,with an OR value of 0.343(P=0.004).The area under curve(AUC)was 0.667.The calibration curve and decision curve indicated that the model had good calibration performance and certain clinical practicability.111 patients were included in the training group(34 in low-risk group and 77 in high-risk group)and 48 patients were included in the validation group(24 in low-risk group and 24 in high-risk group).The AUC range of the four machine learning models was 0.64-0.69,and the AUCs of support vector machine and random forest models were relatively higher.Conclusion:Breast MRI features have a potential role in assessing the recurrence risk of ER+/HER-breast cancer patients.