Study on Lesion Segmentation of Melanoma Images Based on Swin-Transformer
The mainstream models for lesion segmentation in melanoma images are mostly based on Convolutional Neural Networks(CNN)or Vision Transformer(ViT)networks.However,CNN models are limited by the sizes of receptive fields and cannot obtain global contextual information,and ViT models can only extract fixed resolution features and cannot extract features of different granularities.To solve this problem,a hybrid model,namely,SwinTransFuse,which is based on the Swin-Transformer,is established.This model integrates two branches.In the encoding stage,a Noise Reduction image denoising module is used to remove noise,such as hair,from the image.Then,a dual branch feature extraction module composed of a CNN and Swin-Transformer is used to extract the local fine-grained information and global context information of the image.SE modules are used to perform channel attention operations on the global context information from the Swin-Transformer branch to enhance global feature extraction,and a CBAM module is used for spatial attention operations on local fine-grained information from CNN branches to enhance the extraction of local fine-grained features.Next,the Hadamard product operation is used to perform feature interactions on the output features of the two branches to achieve feature fusion.Finally,the features output by the SE block,features output by the CBAM module,and fused features are concatenated to achieve multilevel feature fusion,and the interactive features are output through a residual block.In the decoding stage,the features are input into an upsampling module to obtain the final image segmentation result.The experimental results show that the mean Intersection over Union(mIoU)values of this model on the ISIC2017 and ISIC2018 skin disease datasets are 78.72%and 78.56%,respectively,which are superior to those of other medical segmentation models of the same type and therefore have a higher practical value.