基于Swin-Transformer的黑色素瘤图像病灶分割研究

扫码查看

原文链接

万方数据
维普

中文摘要：黑色素瘤图像病灶分割的主流模型大多基于卷积神经网络(CNN)或Vision Transformer(ViT)网络,但是CNN模型受限于感受野大小,无法获取全局上下文信息,而ViT模型只能提取固定分辨率的特征,无法提取不同粒度的特征.为解决该问题,建立一种基于Swin-Transformer的融合双分支的混合模型SwinTransFuse.在编码阶段,首先利用Noise Reduction图像降噪模块去除图像中的毛发等噪声,然后采用CNN和Swin-Transformer构成的双分支特征提取模块来提取图像的局部细粒度信息和全局上下文信息,并对来自Swin-Transformer分支的全局上下文信息使用SE模块进行通道注意力操作以增强全局特征的提取,对来自CNN分支的局部细粒度信息使用卷积块注意力机制模块(CBAM)进行空间注意力操作以增强局部细粒度特征的提取,接下来利用Hadamard积运算对两个分支输出的特征进行特征交互以实现特征的融合,最后将SE模块输出的特征、CBAM模块输出的特征和特征融合后的特征进行拼接以实现多层次特征融合,并通过一个残差块输出交互后的特征.在解码阶段,将特征输入到上采样模块得到图像最终的分割结果.实验结果表明,该模型在ISIC2017和ISIC2018皮肤病数据集上的平均交并比分别为78.72％和78.56％,优于同类型的其他医学分割模型,具有更高的实用价值.

外文标题：Study on Lesion Segmentation of Melanoma Images Based on Swin-Transformer

外文摘要：The mainstream models for lesion segmentation in melanoma images are mostly based on Convolutional Neural Networks(CNN)or Vision Transformer(ViT)networks.However,CNN models are limited by the sizes of receptive fields and cannot obtain global contextual information,and ViT models can only extract fixed resolution features and cannot extract features of different granularities.To solve this problem,a hybrid model,namely,SwinTransFuse,which is based on the Swin-Transformer,is established.This model integrates two branches.In the encoding stage,a Noise Reduction image denoising module is used to remove noise,such as hair,from the image.Then,a dual branch feature extraction module composed of a CNN and Swin-Transformer is used to extract the local fine-grained information and global context information of the image.SE modules are used to perform channel attention operations on the global context information from the Swin-Transformer branch to enhance global feature extraction,and a CBAM module is used for spatial attention operations on local fine-grained information from CNN branches to enhance the extraction of local fine-grained features.Next,the Hadamard product operation is used to perform feature interactions on the output features of the two branches to achieve feature fusion.Finally,the features output by the SE block,features output by the CBAM module,and fused features are concatenated to achieve multilevel feature fusion,and the interactive features are output through a residual block.In the decoding stage,the features are input into an upsampling module to obtain the final image segmentation result.The experimental results show that the mean Intersection over Union(mIoU)values of this model on the ISIC2017 and ISIC2018 skin disease datasets are 78.72％and 78.56％,respectively,which are superior to those of other medical segmentation models of the same type and therefore have a higher practical value.

外文关键词：

Swin-Transformer modelmelanomafeature fusionnoise reductionISIC2018 dataset

作者：

赵宏、王枭

展开 >

作者单位：

兰州理工大学计算机与通信学院,甘肃兰州 730050

关键词：

Swin-Transformer模型黑色素瘤特征融合降噪 ISIC2018数据集

基金：

国家自然科学基金甘肃省重点研发计划

项目编号：

6216602521YF5GA073

出版年：

2024

DOI：

10.19678/j.issn.1000-3428.0068267

计算机工程

华东计算技术研究所　上海市计算机学会

计算机工程

CSTPCD北大核心

影响因子：0.581

ISSN：1000-3428

年,卷(期)：2024.50(8)

参考文献量6