融合交叉注意力与双编码器的医学图像分割

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：目的在现有的医学图像分割算法中,卷积神经网络(convolutional neural network,CNN)和Transformer相结合的方法占据了主流.然而,这些方法通常不能有效地结合CNN和Transformer所提取到的局部和全局信息.针对这一问题,提出了一种基于全局—局部交叉注意力的双编码器分割网络(dual-encoder global-local cross attention network,DGLCANet).方法 DGLCANet是基于UNet的编码器—解码器结构实现的.首先,采用CNN和交叉形状窗口 Transformer(CSWin Transformer)为主的双编码器结构来提取图像丰富的全局上下文特征以及局部纹理特征.其次,在CNN分支中,引入一个全局—局部交叉注意力Transformer模块来使双分支所提取到的信息关联起来.最后,为了减小编码器与解码器之间的特征差距,本文在原始跳跃连接中插入了一个特征自适应模块.结果将DGLCANet与9种先进的分割算法在4个公开数据集上进行实验对比,其分割效果在交并比(intersection over union,IoU)、Dice系数(Dice coefficient)、准确度(accuracy,ACC)和召回率(recall)指标上均有提高,在4个数据集上的IoU分别达到 85.1％、83.34％、68.01％和85.63％,相较于经典算法UNet分别提升了 8.07％、6.01％、7.83％和3.87％.结论 DGLCANet综合了基于CNN方法和基于Transformer方法的优点,充分利用了图像中的全局和局部信息,具有更优异的分割性能.

外文标题：Dual-encoder global-local cross-attention network for medical image segmentation

外文摘要：Objective With the rapid advancement of medical imaging technology,medical image segmentation has become a popular topic in the field of medical image processing and has been the subject of extensive study.Medical image segmen-tation has a wide range of applications and research values in medical research and practice.The segmentation results of medical images can be used by physicians to determine the location,size,and shape of lesions,providing an accurate basis for diagnosis and treatment.In recent years,UNet based on convolutional neural networks(CNNs)has become a baseline architecture for medical image segmentation.However,this architecture cannot effectively extract global context information due to the limited receptive field of CNNs.The Transformer was originally designed to solve this problem but was limited in capturing local information.Therefore,hybrid networks of CNN and Transformer based on UNet architecture are gradually becoming popular.However,existing methods encounter some shortcomings.For example,these methods typically cannot effectively combine the global and local information extracted by CNN and Transformer.By contrast,while the original skip connection can recover some location information lost by the target features in the downsampling stage,this connection may fail to capture all the fine-grained details,ultimately affecting the accuracy of the predicted segmenta-tion.This paper proposes a dual-encoder global-local cross-attention network with CNN and Transformer(DGLCANet)to address these issues.Method First,a dual-encoder network is adopted to extract rich local and global information from the images,which combines the advantages of CNNs and Transformer networks.In the encoder stage,Transformer and CNN branches are used to extract global and local information,respectively.In addition,the CSWin Transformer with low calcu-lation costs is used in the Transformer branch to reduce the calculation cost of the model.Next,a global-local cross-attention Transformer module is proposed to fully utilize the global and local information extracted by the dual-encoder branch.The core of this module is the cross-attention mechanism,which can further obtain the correlation between global and local features by interacting the information of the two branches.Finally,a feature adaptation block is designed in the skip connection of DGLCANet to compensate for the shortcomings of the original skip connections.The feature adaptation module aims to adaptively match the features between the encoder and decoder,reducing the feature gap between them and improving the adaptive capability of the model.Meanwhile,the module can also recover detailed positional information lost during the encoder downsampling process.Tests are performed on four public datasets,including ISIC-2017,ISIC-2018,BUSI,and the 2018 Data Science Bowl.Among them,ISIC-2017 and ISIC-2018 are used for dermoscopic images of mela-noma detection,containing 2 000 and 2 596 images,respectively.The BUSI dataset,which contains 780 images,is a breast ultrasound dataset for detecting breast cancer.The 2018 Data Science Bowl dataset,which contains a total of 670 images,is used for examining cell nuclei in different microscope images.The resolution of all images is set to 256 × 256 pixels and randomly divided into training and test sets according to the ratio of 8:2.DGLCANet is implemented in the PyTorch framework and was trained on an NVIDIA GeForce RTX 3090Ti GPU with 24 GB of memory.In the experiment,the binary cross-entropy and dice loss functions are mixed in proportion to construct a new loss function.Furthermore,the Adam optimizer with an initial learning rate of 0.001,a momentum parameter of 0.9,and a weight decay of 0.000 1 is employed.Result In this study,four evaluation metrics,including intersection over union,Dice coefficient(Dice),accu-racy,and recall,are used to evaluate the effectiveness of the proposed method.In theory,large values of these evaluation metrics lead to superior segmentation effects.Experimental results show that on the four datasets,the dice coefficient reaches 91.88％,90.82％,80.71％,and 92.25％,which are 5.87％,5.37％,4.65％,and 2.92％higher than the clas-sic method UNet,respectively.Compared with recent state-of-the-art methods,the proposed method also demonstrates its superiority.Furthermore,the graph of the visualized results demonstrates that the proposed method effectively predicts the boundary area of the image and distinguishes the lesion area from the normal area.Meanwhile,compared with other meth-ods,the proposed method can still achieve better segmentation results under the condition of multiple interference factors such as brightness,which are remarkably close to the ground truth.The results of a series of ablation experiments also show that each of the proposed components demonstrates satisfactory performance.Conclusion In this study,a dual-encoder medical image segmentation method that integrates global-local attention mechanism is proposed.The experimen-tal results demonstrate that the proposed method not only improves segmentation accuracy but also obtains satisfactory seg-mentation results when processing complex medical images.Future work will focus on further optimization and in-depth research to promote the practical application of this method and will contribute to important breakthroughs and advance-ments in the field of medical image segmentation.

外文关键词：

medical image segmentationconvolutional neural network(CNN)dual-encodercross attention mecha-nismTransformer

作者：

李赫、刘建军、肖亮

展开 >

作者单位：

江南大学人工智能与计算机学院,无锡 214122

南京理工大学计算机科学与工程学院,南京 210094

关键词：

医学图像分割卷积神经网络(CNN) 双编码器交叉注意力机制 Transformer

出版年：

2024

DOI：

10.11834/jig.230705

中国图象图形学报

中国科学院遥感应用研究所,中国图象图形学学会 ,北京应用物理与计算数学研究所

中国图象图形学报

CSTPCD北大核心

影响因子：1.111

ISSN：1006-8961

年,卷(期)：2024.29(11)