Application of U-Net channel transformation network in gland image segmentation
Objective Adenocarcinoma is a malignant tumor originating from the glandular epithelium and poses immense harm to human health.With the rapid development of computer vision technology,medical imaging has become an impor-tant means for expert preoperative diagnosis.In the diagnosis of adenocarcinoma,doctors judge the severity of the cancer and grade it by analyzing the size,shape,and other external features of the glandular structure.Accordingly,achieving high-precision segmentation of glandular images has become an urgent requirement in clinical medicine.Glandular medi-cal image segmentation refers to the process of separating the glandular region from the surrounding tissue in medical images,requiring high segmentation accuracy.Traditional models for segmenting glandular medical images can suffer from such problems as imprecise segmentation and mis-segmentation owing to the diverse shapes of glands and presence of numerous small targets.To address this issue,this study proposes an improved glandular medical image segmentation algo-rithm based on UCTransNet.UCTransNet addresses solves the semantic gap between different resolution modules of the encoder and between the encoder and decoder,thereby achieving high precision image segmentation.Method First,a com-bination of the fusion of ASPP_SE and ConvBatchNorm modules is added to the front end of the encoder.The ASPP_SE module combines the ASPP module and channel attention mechanism.The ASPP module consists of three different dilation rates of atrous convolution,a 1 × 1 convolution,and an ASPP pooling.Atrous convolution injects holes into standard con-volution to expand the receptive field,obtain dense data features,and maintain the same output feature map size.The ASPP module uses multi-scale atrous convolution to obtain a large receptive field,and fuses the obtained features with the global features obtained from the ASPP pooling to obtain denser semantic information than the original features.The chan-nel attention mechanism enables the model to focus considerably on important channel regions in the image,dynamically select information in the image,and give substantial weight to channels containing important information.In the CCT(channel cross fusion with Transformer),modules with higher weight of important information will achieve better fusion.The ConvBatchNorm module enhances the ability of the encoder to extract the features of small targets,while preventing overfitting during model training.Second,a simplified dense connection is embedded between the encoder and the skip connections,and the CCT in the model performs global feature fusion of the features extracted by the encoder from a chan-nel perspective.Although the global attention ability of the CCT is strong,its problem is a weak local attention ability,and the ambiguity between adjacent encoder modules has not been solved.To solve this problem,a dense connection is added to enhance the local information fusion ability.The dense connection passes the upper encoder module through convolution pooling to obtain the lower encoder module and performs upsampling on the lower encoder to make its resolution consistent with the upper encoder module.The two encoder modules are concatenated on the channel,and the resolution does not change after concatenation.After concatenation,the upper encoder module obtains the feature information supplement of the lower encoder module.Consequently,the semantic fusion between adjacent modules is enhanced,the semantic gap between adjacent encoder modules is reduced,and the feature information fusion between adjacent encoder modules is improved.A refiner is added to the CCT,which projects the self-attention map to a higher dimension,and uses the head convolution to enhance the spatial context and local patterns of the attention map.This method effectively combines the advantages of self-attention and convolution to further improve the self-attention mechanism.Lastly,a linear projection is used to restore the attention map to the initial resolution,thereby enhancing the global feature information fusion of the encoder.A fusion ASPP_SE and ConvBatchNorm modules are added to the front end of the UCTransNet encoder to enhance its ability to extract small target features and prevent overfitting.Second,a simplified dense connection is embed-ded between the encoder and skip connection to enhance the fusion of adjacent module features.Lastly,a refinement mod-ule is added to the CCT to project the self-attention map to a markedly high dimension,thereby enhancing the global feature fusion ability of the encoder.The combination of the simplified dense connection and CCT refinement module improves the performance of the model.Result The improved algorithm was tested on the publicly available gland data sets MoNuSeg and Glas.The Dice and intersection over union(IoU)coefficients were the main evaluation metrics used.The Dice coefficient is a similarity measure used to represent the similarity between two samples.By contrast,the IoU coefficient is a standard used to measure the accuracy of the result's positional information.Both metrics are commonly used in medical image seg-mentation.The test results on the MoNuSeg data set were 80.55%and 67.32%,while those on the Glas data set were 92.23%and 86.39%.These results represent improvements of 0.88%and 1.06%,and 1.53%and 2.43%,respec-tively,compared those of the original UCTransNet.The improved model was compared to existing popular segmentation networks and was found to generally outperform them.Conclusion The proposed improved model is superior to existing seg-mentation algorithms in medical gland segmentation and can meet the requirements of clinical medical gland image segmen-tation.The CCT module in the original model was further optimized to fuse global and local feature information,thereby achieving better results.
medical image segmentationU-Net from a channel-wise perspective with Transformer(UCTransNet)dense connectionself-attention mechanismrefinement module