首页|基于视觉提示学习的天气退化图像恢复

基于视觉提示学习的天气退化图像恢复

扫码查看
尽管现有的天气退化图像恢复方法在单一天气去除任务上已经取得良好表现,但其无法适应真实场景下多变的天气类型.为此,本文提出一种基于视觉提示学习的天气退化图像恢复算法,其是预训练语言图像模型与天气退化图像恢复任务结合的新范式.该算法首先设计一个查询提示约束网络(Query Prompt Contrained Network,QPC-Net),其利用对比语言图像预训练模型中的文本编码器和图像编码器来根据给定的退化图像直接编码其对应真实背景的潜在描述特征.同时,该算法还包括一个示例提示引导网络(Example Prompt Guided Network,EPG-Net),其利用给定的示例图像来引导预训练扩散模型去除查询图像上对应的天气退化.相比类似设定的现有算法,本文算法在8个天气退化数据集上平均改善峰值信噪比2.11dB,平均改善结构相似性4.74%.
Weather-Degraded Image Restoration Based on Visual Prompt Learning
Images captured in real-world scenarios often suffer from weather degradations like random occurrences of rain,haze and snow,which may cause detail occlusion and content deterioration,thereby impacting the effectiveness of subsequent advanced computer vision algorithms.Existing methods for weather-degraded image restoration can be categorized into task-specific,task-aligned and all-in-one types.However,the first two types require specific training for different weather degradations and struggle to adapt to the diverse weather conditions encountered in real-world scenes.Although all-in-one methods achieve the competitive performance across adverse weather degradation removal tasks,they also fail to adapt to the unseen weather degradations,resulting in poor generalization performance.To this end,a weather-degraded image restoration algorithm based on visual prompt learning is proposed in this work,which is a novel paradigm that integrates the pre-trained language-image model with the weather degraded-image restoration.Specifically,even text inputs with similar meanings may yield significantly different latent features when processed through the text encoder of contrastive language-image pre-training(CLIP)model.The general expectation of image restoration is to provide a degraded image and have the model generate its corresponding restored image,rather than multiple different reconstructed images.Therefore,directly using text to guide image reconstruction may lead to unstable solution spaces,often failing to meet the general expectation of image restoration.In response,a query prompt constrained network(QPC-Net)is introduced to utilize the text encoder and image encoder from CLIP to directly encode the latent descriptive features of corresponding ground truth based on the given degraded images.These latent features are further embedded into a pre-trained stable diffusion model using the cross-attention mechanism,thereby constraining the reverse sampling process and facilitating the content reconstruction.QPC-Net consists of two image encoders,with one set of parameters frozen and the other set trainable.Moreover,many existing weather-degraded image algorithms primarily train strict pixel-level mappings between the degraded and clean images,lacking the exploration of knowledge for different image restoration tasks.This limitation makes it difficult for these algorithms to learn the corresponding context for the weather-degraded image restoration tasks not covered in the training dataset,thereby struggling to adapt to different restoration tasks.To address this issue,an example prompt guided network(EPG-Net)is developed to utilize the given example images to guide pre-trained stable diffusion model in learning the context knowledge of corresponding restoration tasks,thereby removing the degradations from query images.Additionally,acquiring suitable example images for complex mixed weather-degraded image restoration tasks are challenging;however EPG-Net can learn the context knowledge from multiple sets of example images.In experimental evaluations conducted on eight seen weather degradation datasets and seven unseen datasets,the proposed algorithm demonstrates significant improvements.Specifically,on the seen weather-degraded datasets,it achieves an average improvement of 2.11dB in peak signal-to-noise ratio(PSNR),4.74%in structural similarity(SSIM),41.08%in perceptual image block similarity(LPIPS)and 24.25%in natural image quality evaluator(NIQE)compared to existing algorithm with similar setting.Additionally,on the unseen weather-degraded datasets,it achieves an average improvement of 1.88 dB in PSNR,5.61%in SSIM,21.40%in LPIPS and 29.29%in NIQE.

computer visionvisual prompt learningin-context learningimage restorationdiffusion model

文渊博、高涛、安毅生、李子琦、陈婷

展开 >

长安大学信息工程学院 西安 710064

长安大学数据科学与人工智能研究院 西安 710064

计算机视觉 视觉提示学习 情境学习 图像恢复 扩散模型

国家重点研发计划陕西省国际科技合作计划项目国家自然科学基金长安大学中央高校基本科研业务费专项资金

2023YFB25047032024GH-YBXM-2452172379300102242901

2024

计算机学报
中国计算机学会 中国科学院计算技术研究所

计算机学报

CSTPCD北大核心
影响因子:3.18
ISSN:0254-4164
年,卷(期):2024.47(10)