Multimedia tools and applications2024,Vol.83Issue(42) :89817-89836.DOI:10.1007/s11042-024-19009-x

Swin-TransUper: Swin Transformer-based UperNet for medical image segmentation

Swin-transuper:基于Swin变换的UperNet医学图像分割

Jianjian Yin Yi Chen Chengyu Li Zhichao Zheng Yanhui Gu Junsheng Zhou
Multimedia tools and applications2024,Vol.83Issue(42) :89817-89836.DOI:10.1007/s11042-024-19009-x

Swin-TransUper: Swin Transformer-based UperNet for medical image segmentation

Swin-transuper:基于Swin变换的UperNet医学图像分割

Jianjian Yin 1Yi Chen 1Chengyu Li 1Zhichao Zheng 1Yanhui Gu 1Junsheng Zhou1
扫码查看

作者信息

  • 1. School of Computer and Electronic Information/Artificial Intelligence,Nanjing Normal University, Wenyuan Road,Nanjing 210023,China
  • 折叠

摘要

卷积神经网络UNet及其变体在医学图像分割中表现出了显著的性能。然而,这些方法只能捕捉局部特征,没有空间相关性,无法全局建模。以往的研究证明局部和全局特征在计算机视觉中是至关重要的。基于上述考虑,本文提出了一种纯变压器模型swin-TransUper。首先,本文探讨了通过引入带移位窗口的分层Swin转换器来扩展UperNet,从而增强模型的全局建模能力。其次,引入SPPM(Swin金字塔池模块)对编码器生成的最深特征进行多尺度特征挖掘,充分考虑最深特征的语义信息。最后,多尺度注意模块对多尺度特征信息进行聚合,得到更精细的特征图。我们的方法在Synapse多器官分割、ISIC2017和基于DSC(Dice相似性系数)度量的ACDC数据集上的性能分别达到80.08%、90.25%和90.62%。同时,在ISIC2017数据集上的实验结果表明,swin-transuper在灵敏度和准确度方面的性能最好,分别达到91.20%和96.44%。

Abstract

Convolutional Neural Network-based UNet and its variants have shown remarkable per- formance in medical image segmentation. However, these methods can only capture local features without spatial correlations and are incapable of global modeling. Previous studies prove that local and global features are critical in computer vision. Therefore, based on the abovementioned considerations, this paper proposes a pure Transformer model named Swin- TransUper. Firstly, we explore extending UperNet by incorporating the hierarchical Swin Transformer with shifted windows, thereby enhancing the global modeling capability of the model. Secondly, we introduce an SPPM (Swin Pyramid Pooling Module) to conduct multi- scale feature mining on the deepest features generated by the encoder, fully considering the semantic information of the deepest features. Finally, the multi-scale attention module aggre- gates the multi-scale feature information to obtain a more refined feature map. Our method achieves the state-of-the-art performance of 80.08%, 90.25%, and 90.62% on the Synapse multi-organ segmentation, ISIC2017, and ACDC datasets based on the DSC (Dice Similarity Coefficient) metric. At the same time, experimental results on the ISIC2017 dataset show that Swin-TransUper achieves the best performance on Sensitivity and Accuracy metrics of 91.20% and 96.44%, respectively.

Key words

Medical image segmentation/Swin Transformer/Swin-TransUper/UperNet/Convolutional neural network

引用本文复制引用

出版年

2024
Multimedia tools and applications

Multimedia tools and applications

EISCI
ISSN:1380-7501
段落导航相关论文