Double-tier multiple instance learning model for histopathology image classification
Objective Whole slide images(WSIs),which refer to scanning and converting a complete microscope slide to digital WSIs,is an efficient technique for visualizing tissue sections in disease diagnosis,medical education,and patho-logical research.Analysis of histopathology WSIs is the gold standard for pathology diagnosis.However,analyzing patho-logical WSIs is a tedious and time-consuming task,and the diagnosis result is easily influenced by personal experience.The increasing use of WSIs in histopathology results in digital pathology providing huge improvements in pathologists'work-flow and diagnosis decision-making,but it also stimulates the need for computer-aided diagnostic tools of WSIs.At pres-ent,a significant number of experts and scholars have begun exploring the application of deep learning in the field of patho-logical image analysis.WSIs possess gigapixel resolution and usually lack pixel-level annotations.Existing deep learning techniques are developed for small-sized conventional images.Therefore,applying these techniques directly to WS1 analy-sis is not feasible.Weakly supervised multiple instance learning(MIL)is a powerful method in analyzing WSIs,and the key component is how to effectively discover the crucial instance that triggers the prediction from massive instances and summarize valuable information from different instances.Previous methods were primarily designed based on the indepen-dent and identical distribution(i.i.d.)hypothesis,disregarding the relationships among different instances and the hetero-geneity of tumors.To solve these problems,a novel double-tier MIL(DT-MIL)model is proposed.Method The proposed method consists of three aspects:1)pre-processing operation of WSIs,2)convolutional neural network(CNN)-based fea-ture encoding,and 3)feature fusion of instance embeddings.First,WSIs are cropped into fixed-sized image patches using a sliding window strategy,filtering out invalid background regions and retaining only the foreground areas containing patho-logical tissues.Second,the CNN-based feature encoder encodes the image patches into fixed-length feature embeddings.Lastly,the proposed DT-MIL model is deployed in the feature fusion part.DT-MIL contains two MIL models in series.The Tier-1 MIL model is applied to generate negative and positive internal queries,also known as the adaptive feature miner.The Tier-2 MIL model consists of deep non-linear and double-detection cross-attention modules.The former maps the instance features in the bag,while the latter is applied to generate a bag-level representation for final classification.In par-ticular,Tier-1's adaptive feature miner applies the idea of Grad-CAM to provide a reliable probability distribution of instances under the AB-MIL framework.Thereafter,highly reliable features are retrieved and aggregated to generate inter-nal query for each subclass.Moreover,adaptive feature miner flexibly selects K discriminative instances to generate reli-able internal query to mitigate the constraints of tumor heterogeneity on model performance and avoid introducing false infor-mation.In addition,adaptive feature miner considers positive and negative instances to prevent biased decision boundary.Tier-2 aims to produce a robust bag-level representation for subsequent classifiers by simultaneously modeling the relation-ship among positive query,negative query,and instances in the bag.Aggregating all instances from the bag by establishing the connections among positive query,negative query,and each instance simultaneously can supplement the feature infor-mation and also enable the model to remain sensitive to positive and negative instances.Consequently,the model is pre-vented from being biased against negative instances,and its robustness is improved.An in-domain feature encoder pre-trained by the self-supervised comparative learning framework SimCLR is also introduced into the proposed model to gener-ate more robust feature embeddings.Result This study performs a comparison and ablation-related experiments on two pub-licly available datasets,namely,CAMELYON-16 and TCGA lung cancer.First,we compared six classical multi-instance learning models.Experimental results show that the proposed model performs optimally and achieves significant improve-ments in accuracy,precision,and recall.In the CAMELYON-16 dataset,testing accuracy,precision,and recall for binary tumor classification reached 95.35%,95.91%,and 94.27%,respectively.In the TCGA lung cancer dataset,test-ing accuracy,precision,and recall for cancer subtype classification achieved 91.87%,91.92%,and 91.83%,respec-tively.The proposed method achieved accuracy rates 2.33%and 0.96%higher than the state-of-the-art methods in the CAMELYON-16 and TCGA lung cancer datasets,respectively.Second,we conducted ablation experiments on the pro-posed model to verify the effectiveness of its key components.Experimental results show that sequentially adding the fea-ture extractor,adaptive feature miner,and dual-path cross-detection module helped improve the accuracy of the model by 31.78%,3.1%,and 0.78%,respectively.Lastly,we compared the proposed adaptive feature miner with traditional K-means clustering and aggregate Top-K instances.Experimental results indicate that the adaptive feature miner can flexibly extract discriminative features,thereby generating optimal internal query.Conclusion The proposed DT-MIL model sinu-ously considers correlation between instances and the tumor heterogeneity.It can better mine the internal feature informa-tion of histopathological images and significantly improve the detection accuracy.This result demonstrates the effectiveness of the proposed model in pathological diagnosis and accurately locating the lesion region.These aspects have high applica-tion value in pathology-assisted diagnostic scenarios.