Combination of latent diffusion and U-shaped networks for HIFU treatment target region extraction
Objective In high intensity focused ultrasound(HIFU)treatment,the target area contains a large amount of pathological information;thus,the target area must be accurately located and extracted by ultrasound monitoring images.As biological tissues and target regions change their relative positions during treatment,the location of the treatment area may also change.At the same time,the diversity of diseases,the variability of tissues,and the complexity of target shapes pose certain challenges for target region extraction in ultrasound medical images.Nevertheless,computers can use advanced image processing and analysis algorithms,combined with big data and machine learning methods,to identify and locate target areas quickly and accurately,providing a reliable basis for quantitative clinical analysis.Traditional image segmentation algorithms mainly include methods,such as threshold segmentation,edge detection,and region growing.However,these methods still have some limitations and are sensitive to the complexity of ultrasound images,noise,and other image quality issues,resulting in poor accuracy and robustness of segmentation results.Meanwhile,traditional meth-ods usually require manual selection of parameters,which limit the adaptive and generalization capabilities of the methods,and have a strong dependence on different images.In recent years,deep learning-based methods have attracted widespread attention and made remarkable progress in the field of medical image segmentation.Most of the methods are performed under strong supervision,yet this type of training requires a large amount of data as support for improved prediction.The amount of data in HIFU therapy ultrasound surveillance images is too small due to patient privacy,differences in acquisi-tion devices,and the need for manual labeling of target areas by specialized physicians.It causes the network not to be adequately trained,making the segmentation results poor in accuracy and robustness.Therefore,this study proposed a method for extracting the target region of HIFU treatment by combining the latent diffusion and U-shaped network.Method First,we train latent diffusion using existing ultrasound surveillance images and their masks,in which the masks are input into the model as condition vectors to generate ultrasound surveillance images with the same contours.To ensure further that the quality of the generated images is close to that of the original images,we design an automatic filtering module that calculates the Frechet inception distance score(FID)of the generated images with respect to the original images by setting the threshold value of the FID to achieve the reliability of the data expansion of ultrasound surveillance images.Second,we propose a novel U-shaped segmentation network(NUNet),whose main body adopts the encoder and decoder of U-Net.Combining atrous spatial pyramid pooling(ASPP)on the encoder side expands the sensory field of the network to extract image features more efficiently.Inspired by the spatial attention and channel attention mechanisms,we design the dual attention skip connection module(DAttention-SK)to replace the original skip connection layer,which improves the effi-ciency of splicing low-level information with high-level information and reduces the risk of losing information,such as edge texture.At the same time,incorporating multiple cross entropy losses supervises the network to retain useful details and contextual information.Finally,the images generated using latent diffusion are combined with the existing ultrasound sur-veillance images as a training set.The effect of segmentation errors due to data scarcity in ultrasound surveillance images is reduced to improve the accuracy of segmentation further.Result All experiments were implemented in PyTorch on NVIDIA GeForce RTX 3080 GPU.We trained latent diffusion using datasets collected from clinical treatments and determine the quality of the generated images by FID.For the training strategy of the generative network,the initial learning rate was set to 1 × 10-4,the batch size was adjusted to 2,and the training epoch was 200.When training the segmentation network,the initial learning rate was set to 1× 10-4,the batch size was adjusted to 24,and the training epoch was 100.To verify the superiority of the proposed method,we compared the popular generative and segmentation models.Experimental results showed that the ultrasound surveillance images generated using latent diffusion in exhibit better metrics on FID and learned perceptual image patch similarity(LPIPS)compared with other generative models(0.172 and 0.072,respectively).Under the training set of ultrasound surveillance images of uterine fibroids clinically treated with HIFU,the proposed seg-mentation algorithm obtained an improvement in mean intersection over union(MIoU)and Dice similarity coefficient(DSC)by 2.67%and 1.39%,respectively,compared with the state-of-the-art PDF-UNet.Validation was continued in a breast ultrasound image dataset to explore the generalization of the proposed algorithm.Compared with the state-of-the-art M2SNet,the proposed algorithm's MIoU and DSC are improved by 2.11%and 1.36%,respectively.Conclusion A method for extracting the target region of HIFU treatment was proposed by combining latent diffusion and a U-shaped net-work.For the first time,latent diffusion was introduced into the generation of ultrasound surveillance images for HIFU treatment,solving the problems of insufficient dataset diversity and data scarcity.Combining ASPP and dual-attention skip connection module in the segmentation network reduces the risk of losing information,such as the edge texture of the target region,and achieves accurate extraction of the target region in the surveillance ultrasound image.The proposed algorithm solves the problem of insufficient diversity of datasets in surveillance ultrasound images to a certain extent and realizes the accurate extraction of target regions in surveillance ultrasound images.
high intensity focused ultrasound(HIFU)image segmentationimage generationloss functionlatent-diffusion