基于分层特征渐进融合的糖尿病视网膜病变图像分割方法

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：利用深度学习技术分割糖尿病视网膜病变(DR)图像可以有效辅助眼科医生进行DR筛查、分级和治疗进展的监测.然而,不同的DR病变在尺度、形状、位置、颜色和纹理等方面存在较高的类间相似性,给DR图像的自动分割任务带来了较大挑战.为此,本课题组提出了一种新颖的基于深度语义和边缘信息的渐进多特征融合网络(PMFF-Net),以同时分割多类DR病灶.该网络主要包括混合Transformer模块(HT模块)、选择性边缘聚合模块(SEA模块)、渐进特征融合模块(GCF模块)和动态注意力模块(DA模块),其中:HT模块集成了卷积神经网络(CNN)、多尺度通道注意力和Transformer,以增强对DR病变特征的表示能力;GCF模块借助高等级语义特征的引导实现了相邻编码层特征之间的渐进融合,有效弥补了不同等级特征之间的语义偏差;DA模块采用自适应学习策略从融合的特征中进行动态特征推理和选择,提高了多尺度特征之间的一致性;SEA模块通过有选择性地聚合DR边缘特征和语义信息,细化了病变的边缘轮廓并重新校准了病变位置.所提PMFF-Net在IDRiD数据集上的mDice和mIoU分别为45.11％和33.39％,在DDR数据集上的mDice和mIoU分别为36.64％和35.04％,优于当前先进的DR分割方法.进一步,通过跨数据集交叉测试验证了模型良好的泛化能力.最后,通过消融实验验证了所提模型中各模块的有效性.出色的分割结果表明所设计的PMFF-Net在DR筛查和检测临床实践中具有很大的应用潜力.

外文标题：Diabetic Retinopathy Lesion Segmentation Based on Hierarchical Feature Progressive Fusion in Retinal Fundus Images

外文摘要：Objective Diabetic retinopathy(DR)is one of the most common complications of diabetes and one of the main causes of irreversible vision impairment or permanent blindness among the working-age population.Early detection has been shown to slow the disease's progression and prevent vision loss.Fundus photography is a widely used modality for DR-related lesion identification and large-scale screening owing to its non-invasive and cost-effective characteristics.Ophthalmologists typically observe fundus lesions,including microaneurysms(MAs),hemorrhages(HEs),hard exudates(EXs),and soft exudates(SEs),in images to perform manual DR diagnosis and grading for all suspected patients.However,expert identification of these lesions is cumbersome,time consuming,and easily affected by individual expertise and clinical experience.With the increasing prevalence of DR,automated segmentation methods are urgently required to identify multiclass fundus lesions.Recently,deep-learning technology,which is represented by convolutional neural networks(CNNs)and Transformers,has progressed significantly in the domain of medical-image analysis and has become the mainstream technology for DR-related lesion segmentation.The most commonly used methods are semantic segmentation-oriented CNNs,Transformers,or their combinations.These deep-learning methods exhibit promising results in terms of both accuracy and efficiency.Nevertheless,CNN-based methods are inferior in terms of global-character contextual information owing to their intrinsically limited receptive field,whereas Transformer-based approaches exhibit low local inductive biases and subpar perception of multiscale feature dependencies.Whereas models combining CNNs with transformers exhibit clear advantages,they require the extraction of deep semantic characteristics and direct feature concatenation from the same feature level without fully considering the importance of concrete boundary information for small-lesion segmentation,thus resulting in inadequate feature interaction between adjacent layers and conflicts among different feature scales.Moreover,these methods only focus on a certain type of DR lesion and seldom delineate multitype lesions simultaneously,thereby hampering their practical clinical application.Methods In this study,we developed a novel progressive multifeature fusion network based on an encoder-decoder U-shaped structure,which we named PMFF-Net,to achieve accurate multiclass DR-related fundus lesion segmentation.The overall framework of the proposed PMFF-Net is shown in Fig.1.It primarily comprises an encoder module embedding a hybrid Transformer(HT)module,a gradual characteristic fusion(GCF)module,a selective edge aggregation(SEA)module,a dynamic attention(DA)module,and a decoder module.For the encoder module,we sequentially cascaded four HT blocks to form four stages to excavate multiscale long-range features and local spatial information.For a fundus image I∈ R(H×W×C)(with height H,weight W,and C channels)as the input,we first applied a convolutional stem with a convolutional layer and MaxPooling layer for patch partitioning,which resulted in N patches X with a convolutional layer.The resulting patches X were embedded into the image tokens E using a trainable linear projection,and we denoted the output of the convolutional stem as F0=E.Subsequently,the embedded tokens E were fed into the four encoder stages to generate hierarchical feature maps Fi∈R(H/2i+1×W/2i+1×Ci)(i=1,2,3,4).The designed GCF module gradually aggregates the adjacent features of various scales under the guidance of high-level semantic cues to generate an enhanced feature representation FGCFi(i=2,3,4)in each layer,except for the first layer and the narrow semantic gaps between different levels of features.Subsequently,the presented DA module dynamically selects useful features and refine the merged characteristics to obtain consistent multiscale features Ai(i=2,3,4)using a dynamic learning algorithm.Meanwhile,the developed SEA module incorporates low-level boundary features F1 and high-level semantic feature information A3 and A4 to dynamically establish the association between lesion areas and edges,refine lesion boundary features,and recalibrate the lesion location.In the decoder module,we introduced a successive patch-expanding layer between adjacent resolution blocks to double the size of the feature map and halve the number of channels.Within each convolution block,a convolution layer was embedded to learn informative features.Finally,we applied a prediction head to obtain the lesion-segmentation probability map Y∈R(H×W×K),where K indicates the number of categories corresponding to the K-1 lesion map and background map.Results and Discussions We used two publicly available DR datasets,i.e.,IDRiD and DDR,to verify the proposed PMFF-Net.The comparison results(see Tables 1 and 2)show that our PMFF-Net performs better than the current state-of-the-art DR lesion-segmentation models on the two datasets,with mDice and mIoU values of approximately 45.11％and 33.39％,respectively,for predicting EX,HE,MA,and SE simultaneously on the IDRiD dataset;and mDice and mIoU values of 36.64％and 35.04％,respectively,on the DDR dataset.Specifically,compared with H2Former,our model achieves higher mDice and mIoU values by 3.94 percentage points and 3.28 percentage points,respectively,on the IDRiD dataset,and 4.55 percentage points and 4.69 percentage points higher values,respectively,compared with those of PMCNet.On the DDR dataset,our model achieves the best segmentation results,outperforming H2Former by 5.17 percentage points and 6.15 percentage points in terms of mDice and mIoU,respectively,and surpassing PMCNet by 6.36 percentage points and 7.43 percentage points,respectively.Meanwhile,our model can provide real-time DR-lesion analysis,with analysis times of approximately 34.74 and 38.48 ms per image on the IDRiD and DDR datasets,respectively.The visualized comparison results shown in Figs.6 and 7 indicate that the results predicted by our model are more similar to the ground truth compared with those of other advanced methods.The cross-validation results across datasets presented in Tables 3 and 4 show that,compared with other advanced segmentation methods,our model offers better generalizability.The perfect segmentation performance of the developed PMFF-Net may be attributed to the ability of our HT module in capturing global context information and local spatial details,the GCF module gradually aggregating different levels of multiscale features through high-level semantic information guidance,the DA module eliminating irrelevant noise and enhancing DR-lesion discriminative feature identification,and the SEA block establishing a constraint between the DR-lesion region and boundary.Additionally,the effectiveness of the components of the proposed PMFF-Net was justified,including the HT,GCF,DA,and SEA modules,on the IDRiD dataset.Conclusions In this study,we developed a novel PMFF-Net for the simultaneous segmentation of four types of DR lesions in retinal fundus images.In the PMFF-Net,we constructed an HT module by elegantly integrating a CNN,multiscale channel attention,and Transformer to model the long-range global dependency of lesions and their local spatial features.The GCF module was designed to merge features from adjacent encoder layers progressively under the guidance of high-level semantic cues.We utilized a DA module to suppress irrelevant noisy interference and refine the fusion multiscale features from the GCF module dynamically.Furthermore,we incorporated an SEA module to emphasize lesion boundary contours and recalibrate lesion locations.Extensive experimental results on the IDRiD and DDR datasets show that our PMFF-Net perform better than other competitive segmentation methods.By performing cross-validation across datasets,the excellent generalizability of our model can be similarly demonstrated.Finally,we demonstrated the effectiveness and necessity of the proposed model via a comprehensive ablation analysis.The developed method can serve as a general segmentation framework and has been applied to segment other types of biomedical images.

外文关键词：

image processingdiabetic retinopathy lesion segmentationdepth semanticsedge informationdynamic attentiongradual characteristic fusion

作者：

丁鹏超、李峰

展开 >

作者单位：

上海理工大学光电信息与计算机工程学院,上海 200093

关键词：

图像处理糖尿病视网膜病变分割深度语义边缘信息动态注意力渐进特征融合

出版年：

2024

DOI：

10.3788/CJL240731

中国激光

中国光学学会　中科院上海光机所

中国激光

CSTPCD北大核心

影响因子：2.204

ISSN：0258-7025

年,卷(期)：2024.51(21)