Objective The aim of this study is to construct multi-class machine learning models for preoperatively pre-dicting the HER2 expression status of breast cancer patients using mammography and ultrasound dual-modal imaging fea-tures.Methods A total of 632 female breast cancer patients meeting the inclusion criteria were collected,including 141 cases with non-expression of HER2,311 cases with low expression,and 180 cases with high expression.Extract imaging fea-tures of breast cancer lesions from FFDM,DBT,and US images of the study subjects.After completing data preprocess,a three-class prediction model for the HER2 expression status in breast cancer was constructed based on five machine learning algorithms.The machine learning models were trained using five-fold cross-validation,and the overall accuracy,precision,re-call,F1-score,and the AUC were calculated for the three-class models.The Bootstrap method was used to compare the AUCs between models in order to select the optimal three-class classification model with the best performance.The SHAP method was employed to assess the importance of each feature in predicting the HER2 expression status.Results In the test set,the RF model demonstrated the best performance in the three-class classification of HER2 expression status.The macro-av-erage AUC was 0.723,the micro-average AUC was 0.783,and the overall accuracy was 57.4%.The macro-average recall rate was 53.0%.SHAP analysis revealed that the five most important global features influencing theoutput of the RF model were,in descending order,calcifications observed on X-ray,diastolic blood pressure,linear or branching calcifications,maxi-mum lesion diameter measured by US,and CA153.Conclusion The machine learning multi-class prediction model based onmammographic and ultrasound dual-modal imaging features has the ability to predict the HER2 status of breast cancer pa-tients before operation.Combining with SHAP values can enhance the interpretability of the machine learning model.
Breast CancerHER2 StatusMammographyUltrasoundMachine Learning