混合监督学习的乳腺癌全切片病理图像分类

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：目的自监督与弱监督学习是解决乳腺癌全切片病理图像分类标注困难的有效方式.然而,由于组织病理图像的复杂性与多样性,仅依靠自监督学习生成的伪标签可能无法准确反映图像真实类别信息;同时,单一弱监督学习方法又存在标签信息匮乏等问题,在病理图像学习过程中易受干扰而导致预测结果不稳定.为此,提出了一种混合监督学习的乳腺癌全切片病理图像分类方法.方法首先,使用基于MoBY自监督框架进行训练,通过对比学习方式深入挖掘乳腺癌病理图像内在结构信息;然后,采用弱监督多示例学习方法进一步优化自监督模型,来获得更精准的判别示例;最后,从每幅全切片中筛选出具有代表性的乳腺癌病理图像关键示例,并借助Transformer编码器实现关键示例的特征融合以增强不同病理图像块之间的关联性,从而实现乳腺癌全切片病理图像的高精度分类.结果在公开的Camelyon-16乳腺癌病理图像数据集上进行实验评估,相比于该数据集上既有最优弱监督和自监督方法,本文方法的曲线下面积值分别可提升2.34％和2.74％,验证了所提出混合监督学习方法的有效性.此外,在MSK(Memorial Sloan-Kettering)腺癌病理外部验证数据集上较有监督方法取得了 6.26％的性能提升,表明了本文方法的良好泛化能力.结论提出了混合监督学习的乳腺癌全切片病理图像分类方法,通过集成MoBY自监督对比学习与Transformer弱监督多示例学习,实现了乳腺癌全切片病理图像的更准确分类.

外文标题：Whole slide pathological image classification of breast cancer based on mixed supervision learning

外文摘要：Objective Breast cancer belongs to the most common malignant tumors among women,and its early diagnosis and accurate classification bear great importance.Breast cancer whole slide pathological images serve as important auxil-iary diagnostic means,and their classification can assist doctors in the accurate identification of tumor types.However,given the complexity and huge data volume of breast cancer whole slide pathological images,manual annotation of the label of each image becomes time consuming and labor intensive.Therefore,researchers have proposed various automated meth-ods to address the issue encountered in the classification of breast cancer whole slide pathological images.Self-and weakly supervised learning effectively tackling the challenge of breast cancer whole slide pathological image classification.Self-supervised learning is a type of machine learning method that skips the manual annotation of labels.This method design tasks that enable the model to learn feature representations from unlabeled data.Self-supervised learning has achieved remarkable progress in the field of computer vision,but it still faces certain challenges in breast cancer whole slide patho-logical image classification.Given the complexity and diversity of pathological images,relying solely on the pseudo labels generated by self-supervised learning may fail to accurately reflect the true classification information,which affects the classi-fication performance.On the other hand,weakly supervised learning leverages information from unlabeled image data through various methods,such as multiple instance learning or label propagation.However,the associated models encoun-ter challenges,such as limited label information and noise,which affect the model's stability during the learning process and thus the stability of prediction results.To overcome the limitations of self-and weakly supervised learning,this paper proposes a mixed supervised learning method for breast cancer whole slide pathological image classification.The integra-tion of MoBY self-supervised contrastive learning with weakly supervised multi-instance learning combines the advantages of these learning architectures and makes full use of unlabeled and noisy labeled data.In addition,such combination improves the classification performance through feature selection and spatial correlation enhancement,which results in increased robustness.Method First,the self-supervised MoBY was used to train the model on unlabeled pathological image data.MoBY,can learn key feature representations from images,is a self-supervised learning method based on self-reconstruction and contrastive learning.This process enables the model to extract useful feature information from unlabeled data and provide better initialization parameters for subsequent classification tasks.Then,a weakly supervised learning approach based on multiple instance learning was used for further model optimization.Multiple instance learning utilizes information from unlabeled image data for model training.In breast cancer whole slide pathological image classification,the accurate annotation of each image category often presents a challenge.This type of learning divides images into positive and negative instances based on instance-level labels to train the model.This approach partially contributes to solving the problem of limited label information and improves a model's robustness and generalization capability.For the feature selec-tion stage,representative feature vectors were selected from each whole slide image to reduce redundancy and noise,extract the most informative features,and improve the model's focus and discriminative capability toward key regions.In addition,the paper leverages a Transformer encoder to improve the correlation among various image patches.The Trans-former encoder is a powerful tool for modeling global contextual information in images,and it captures semantic relation-ships between different regions of an image to further increase the classification accuracy.The introduction of the Trans-former encoder into breast cancer whole slide pathological image classification enables the improved utilization of global image information and further understanding of a model's image structure and context.Comprehensive application of meth-ods,such as self-and weakly supervised learning,resulted in the high accuracy and robustness of the proposed mixed supervised learning approach for the classification of breast cancer whole slide pathological images in this paper.In experi-ments,this method achieved excellent classification results on a dataset of breast cancer whole slide pathological images.This approach serves as a powerful tool and technical support for the early diagnosis and accurate classification of breast cancer.Result The effectiveness of the mixed supervised model was validated through evaluation experiments conducted on the publicly available Camelyon-16 breast-cancer pathological image dataset.Compared with the state-of-the-art weakly and self-supervised models of this dataset,the proposed model achieved evident improvements of 2.34％and 2.74％in the area under the receiver operating characteristic,respectively.This finding indicates that the proposed method outper-formed the other models in terms of breast cancer whole slide pathological image classification tasks.To further validate its generalization capability,we performed experiments on an external validation dataset of MSK.The proposed model for this validation dataset demonstrated a great performance improvement of 6.26％,which further confirms its strong generaliza-tion capability and practicality.Conclusion The proposed breast cancer whole slide pathological image classification method based on mixed supervision achieved remarkable results in addressing the related challenge By leveraging the advantages of self-supervised learning,weakly supervised learning,and spatial correlation enhancement,the given model demonstrated improved classification performance on public and external validation datasets.This method exhibits a good generalization capability and offers a viable solution for the early diagnosis and treatment of breast cancer.Future research should further refine and optimize the proposed method to increase its accuracy and robustness in breast cancer whole slide pathology image classification.This paper will address the challenges in breast cancer pathological image classification and contribute to the development of early breast cancer diagnosis and treatment.

外文关键词：

breast cancer whole slide pathology imageclassificationmixed supervised learningfeature fusionTrans-former

作者：

张建新、高程阳、孙鉴、丁雪妍、刘斌

展开 >

作者单位：

大连民族大学计算机科学与工程学院,大连 116600

大连理工大学国际信息与软件学院,大连 116620

关键词：

乳腺癌全切片病理图像分类混合监督学习特征融合 Transformer

基金：

国家自然科学基金项目辽宁省基础研究项目辽宁省博士科研启动基金项目中央高校基本科研业务费专项资金项目

项目编号：

619720622023JH2/1013001912023-BS-07804442024042

出版年：

2024

DOI：

10.11834/jig.230343

中国图象图形学报

中国科学院遥感应用研究所,中国图象图形学学会 ,北京应用物理与计算数学研究所

中国图象图形学报

CSTPCD北大核心

影响因子：1.111

ISSN：1006-8961

年,卷(期)：2024.29(9)