融合非局部特征表示的模糊图像复原

扫码查看

原文链接

万方数据
维普

中文摘要：目的基于深度学习的端到端单图像去模糊方法已取得了优秀成果.但大多数网络中的构建块仅专注于提取局部特征,而在建模远距离像素依赖关系方面表现出局限性.为解决这一问题,提出了一种为网络引入局部特征和非局部特征的方法.方法采用现有的优秀构建块提取局部特征,将大窗口的Transformer块划分为更小的不重叠图像块,对每个图像块仅采样一个最大值点参与自注意力运算,在不占用过多计算资源的情况下提取非局部特征.最后将两个模块结合应用,在块内耦合局部信息和非局部信息,从而有效捕捉更丰富的特征信息.结果实验表明,相比于仅能提取局部信息的模块,提出的模块在峰值信噪比(peak signal-to-noise ratio,PSNR)指标上的提升不少于1.3dB.此外,设计两个局部与非局部特征耦合的图像复原网络,分别运用在单图像去运动模糊和去散焦模糊任务上,与 Uformer(a general U-shaped Transformer for image restoration)相比,在去运动模糊测试集 GoPro(deep multi-scale convolutional neural network for dynamic scene deblurring)和 HIDE(human-aware motion deblurring)上的平均PSNR分别提高了 0.29 dB和0.25 dB,且模型的浮点数更低.在去散焦模糊测试集DPD(defocus deblurring using dual-pixel data)上,平均PSNR提高了0.42dB.结论本文方法在块内成功引入非局部信息,使得模型能够同时捕捉局部特征和非局部特征,获得更多的特征表示,提升了去模糊网络的性能.同时,恢复图像也具有更清楚的边缘,更接近真实图像.

外文标题：Nonlocal feature representation-embedded blurred image restoration

外文摘要：Objective Image deblurring is a classic low-level computer vision problem that aims to restore a sharp image from a blurry image.In recent years,convolutional neural networks(CNNs)have boosted the advancement of computer vision considerably,and various CNN-based deblurring methods have been developed with remarkable results.Although convolution operation is powerful in capturing local information,the CNNs show a limitation in modeling long-range depen-dencies.By employing self-attention mechanisms,vision Transformers have shown a high ability to model long-range pixel relationships.However,most Transformer models designed for computer vision tasks involving high-resolution images use a local window self-attention mechanism.This is contradictory to the goal of employing Transformer structures to capture true long-range pixel dependencies.We review some deblurring models that are sufficient for processing high-resolution images;most CNN-based and vision Transformer-based approaches can only extract spatial local features.Some studies obtain the information with larger receptive field by directly increasing the window size,but this method not only has exces-sive computational overhead but also lacks flexibility in the process of feature extraction.To solve the above problems,we propose a method that can incorporate local and nonlocal information for the network.Method We employ the local feature representation(LFR)modules and nonlocal feature representation(NLFR)modules to extract enriched information.For the extraction of local information,most of the existing building blocks have this capability,and we can treat these blocks directly as LFR modules.In addition to obtaining local information,we also designed a generic NLFR module that can be easily combined with the LFR module for extracting nonlocal information.The NLFR module consists of a nonlocal feature extraction(NLFE)block and an interblock transmission(IBT)mechanism.The NLFE block applies a nonlocal self-attention mechanism,which avoids the interference of local information and texture details,captures purer nonlocal infor-mation,and considerably reduces the computational complexity.To reduce the effect of accumulating more local informa-tion in the NLFE block as the network depth increases,we introduce an IBT mechanism for successive NLFE blocks,which provides a direct data flow for the transfer of nonlocal information.This design has two advantages:1)The NLFR module ignores local texture details in features when extracting information to ensure that information does not interfere with each other.2)Instead of computing the self-similarity of all pixels within the receptive field,the NLFR module adaptively samples the salient pixels,considerably reducing computational complexity.We selected LeFF and ResBlock as the LFR module combined with the NLFR module and designed two models named NLCNet_L and NLCNet_R to deal with motion blur removal and defocus blur removal,respectively,based on the single-stage UNet as the model architecture.Result We verify the gains of each component of the NLFR module in the network;the network consisting of the NLFR module com-bined with the LFR module obtains peak signal-to-noise ratio(PSNR)gains of 0.89 dB compared with using only the LFR as the building block.Applying the IBT module over this,the performance is further improved by 0.09 dB on PSNR.For fair comparisons,we build a baseline model only using ResBlock as the building block with similar computational overhead and number of parameters to the proposed network.Results demonstrate that NLFR-combined ResBlock is more effective in constructing a deblurred network than directly using ResBlock as the building block.In scalability experiments,the experi-ment shows that the combination of NLFR modules with existing building blocks can remarkably improve the deblurring per-formance,including convolutional residual blocks and a Transformer block.In particular,two networks designed with NLFR-combination LeFF block and ResBlock as the building blocks achieve excellent results in single-image motion deblurring and dual-pixel defocus deblurring compared with other methods.In accordance with a popular training method,NLCNet_L was trained on the GoPro dataset with 3 000 epochs and tested on the GoPro test set.Our method achieves the best results on the GoPro test set with the lowest computational complexity.Compared with the previous method Uformer,our method improves PSNR by 0.29 dB.We trained NLCNet_R on the DPD dataset for 200 epochs for two-pixel defocus deblurring experiments.In the combined scene category,we achieved excellent performance in all four metrics.Compared with the previous method Uformer,our method improves the PSNR in indoor and outdoor scenes by 1.37 dB and 0.94 dB,respectively.Conclusion We propose a generic NLFR module to represent the extraction of real nonlocal information from images,which can be coupled with local information within the block to improve the expressive ability of the model.Through rational design,the network composed of NLFR modules achieves excellent performance with low computational consumption,and the visual effect of the recovered image,especially the edge contours,is clearer and more complete.

外文关键词：

motion blurdefocus blurself-attentionnon-local featuresfusion network

作者：

华夏、舒婷、李明欣、时愈、洪汉玉

展开 >

作者单位：

武汉工程大学电气信息学院,武汉 430205

关键词：

运动模糊散焦模糊自注意力非局部特征融合网络

基金：

国家自然科学基金项目国家自然科学基金项目国家自然科学基金项目

项目编号：

618013376247134562171329

出版年：

2024

DOI：

10.11834/jig.230735

中国图象图形学报

中国科学院遥感应用研究所,中国图象图形学学会 ,北京应用物理与计算数学研究所

中国图象图形学报

CSTPCD北大核心

影响因子：1.111

ISSN：1006-8961

年,卷(期)：2024.29(10)