Two-stage Network Image Inpainting with Dual Generation of Structure and Texture
Objective Existing image inpainting methods fail to achieve effective bidirectional interaction between structure and texture information,resulting in issues like texture blur and structural distortion when repairing images with large missing areas or complex textures.Methods A two-stage network image inpainting method was proposed,employing a bidirectional coordinate attention fusion module and a Fourier feature aggregation module.Firstly,the damaged image was subjected to structure reconstruction and texture synthesis using structure encoder-decoder and texture encoder-decoder,generating preliminary inpainting results.Subsequently,the coarse inpainting result was input to a refinement inpainting network,where the bidirectional coordinate attention fusion module and the Fourier feature aggregation module were utilized to repair the internal texture details of the image.To enhance global consistency,the bidirectional coordinate attention fusion module was designed to facilitate bidirectional interaction between structure and texture information.Additionally,the Fourier feature aggregation module was designed to capture global contextual information,enhancing the correlation between local image features to obtain fine inpainting results.Moreover,dual-stream discriminators were employed to estimate the feature statistics of structure and texture,distinguishing between original and generated images.Results In experiments conducted on the CelebA-HQ dataset,compared with four image inpainting methods,qualitative results indicated that face images generated by this method were clearer and more natural;the quantitative results showed that this method outperformed the contrastive algorithms in peak signal-to-noise ratio,structural similarity index,and Fréchet distance.Ablation experiments on various modules of the model also validated the effectiveness of the proposed innovations.Conclusion Therefore,the proposed method effectively restores damaged face images,especially generating images with reasonable structure and clear texture even under large occlusions.