Multi-scale fusion-enhanced ultrasound elastic images segmentation for mediastinal lymph node
Objective Ultrasound elastography enables non-invasive diagnosis of lesion tissues by analyzing the differences in hardness among different body tissues.It is gradually being used in the diagnoses of many diseases.In bronchial ultra-sound elastography,accurately segmenting mediastinal lymph nodes from images is significant for diagnosing whether lung cancer has metastasized and has an important role in the consequent staging and diagnosis of cancer.Manual segmentation methods performed by radiologists are always time-consuming,and research on automated segmentation,specifically for ultrasound elastic images,is limited.Therefore,deep learning-based assisted segmentation methods have attracted consid-erable attention.Although ultrasound elastic images can provide some guidance for the segmentation of regions of interest,the obscuring of texture information in this area also makes segmentation challenging to execute.Existing research has focused primarily on the encoder structure of the model,particularly by incorporating different pre-trained models to accom-modate the three-channel data format of ultrasound elastic images.However,limited research has been conducted on the intermediate features obtained by the encoder and decoder structures,resulting in less precise segmentation results.There-fore,this study proposes a network for the segmentation of the mediastinal lymph node,called attention-based multi-scale fusion enhanced ultrasound elastic images segmentation network for mediastinal lymph node(AMFE-UNet).Method First,a pre-trained dense convolutional network(DenseNet)with dense connections is introduced into the U-Net architec-ture to extract channel and position information from ultrasound elastic images.Second,to model the boundaries and tex-tures of the nodules from different scales and scopes,this research enhanced the decoder module with efficient channel attention(ECA)and dilated convolutions.Three dilated convolution branches and one pooling branch are set up in each decoder module.Different combinations of the results from these branches are used to obtain the following four decoder structures.1)Decoder-A:Results from each branch are added and passed through the ECA module.2)Decoder-B:Results from each branch are concatenated along the channel dimension and passed through an ECA module.3)Decoder-C:Each branch is equipped with an ECA module,and results from each branch are concatenated along the channel dimen-sion.4)Decoder-D:Results from each branch are densely connected and passed through an ECA module.Lastly,selec-tive kernel network(SK-Net)is used to enhance the fusion of features obtained from the encoder and decoder,ensuring a considerably comprehensive integration.In the experiments,the proposed models are implemented using Python 3.7 and PyTorch 1.12.The image processing workstation is equipped with an Intel i9-13900K CPU and two NVIDIA RTX 4090 GPUs,each with 24 GB memory.The initial parameters of the model are obtained using the default initialization method in PyTorch.The Adam optimizer is used to update the network parameters.Learning rate is initially set to 0.000 1,with a weight decay coefficient of 0.1,and it is decayed every 90 iterations.Dice coefficient is used as loss function,and the model is trained for 190 epochs.Result The experiment is performed on a collected dataset of bronchial ultrasound elastic images with six-fold cross-validation.The evaluation metrics include the Dice coefficient,sensitivity,specificity,preci-sion,intersection over union(IoU),Hausdorff distance 95 percentile(HD95),parameters,and GFlops.The range of the first five metrics is between 0 and 1;a higher value indicates better segmentation performance.HD95 does not have a specified range,and a lower value indicates better segmentation performance.The ablation experiments show improve-ments in the skip connection structure and decoder structure proposed for the model.The model using SK-Net as skip con-nections is only slightly less sensitive than Dense-UNet,while the remaining five metrics are better than Dense-UNet.The four models using the multi-scale fusion-enhanced decoder outperform Dense-UNet by 0.4%to 0.9%in Dice coefficient and up to 2%in precision.Two final models were designed according to the ablation experiment:AMFE-UNet A and AMFE-UNet B.AMFE-UNet compared with a variety of models,including U-Net,Att-UNet,Seg-Net,DeepLabV3+,Trans-UNet,U-Net++,BPAT-UNet,CTO,and ACE-Net.The Dice coefficient of AMFE-UNet is 86.59%on average,which is an improvement of 1.983%compared with U-Net.AMFE-UNet A is optimal in terms of Dice coefficient,preci-sion,and specificity.Meanwhile,AMFE-UNet B is optimal in terms of sensitivity,IoU,and HD95.The class activation map demonstrates that AMFE-UNet achieves better segmentation sensitivity and completeness by focusing on the content of the region at the lower levels of the network and on the boundaries of the region at the higher levels of the network.The other networks only focus on the content of the region and are ineffective at segmenting the region's boundaries.The loss variation curves for training and testing of the model indicate that AMFE-UNet B has faster convergence and better segmen-tation than AMFE-UNet A.Conclusion Adequate experiments demonstrate the excellent segmentation effectiveness of the AMFE-UNet combined attention mechanism for ultrasound elastic images,which has significance for future research on multichannel medical images.The code is available at https://github.com/Philo-github/AMFE-UNet.