A Fusion Network for Infrared and Visible Images Based on Pre-trained Fixed Parameters and Deep Feature Modulation
To better leverage complementary image information from infrared and visible light images and generate fused images that align with human perception characteristics,a two-stage training strategy is proposed to obtain a novel infrared-visible image fusion Network based on pre-trained fixed Parameters and Deep feature modulation(PDNet).Specifically,in the self-supervised pre-training stage,a substantial dataset of clear natural images is employed as both inputs and outputs for the UNet backbone network,and pre-training is accomplished with autoencoder technology.As such,the resulting encoder module can proficiently extract multi-scale depth features from the input image,while the decoder module can faithfully reconstruct it into an output image with minimal deviation from the input.In the unsupervised fusion training stage,the pre-trained encoder and decoder module parameters remain fixed,and a fusion module featuring a Transformer structure is introduced between them.Within the Transformer structure,the multi-head self-attention mechanism allocates deep feature weights,extracted by the encoder from both infrared and visible light images,in a rational manner.This process fuses and modulates the deep image features at various scales into the manifold space of deep features of clear natural image,thereby ensuring the visual perception quality of the fused image after reconstruction by the decoder.Extensive experimental results demonstrate that,in comparison to current mainstream fusion models(algorithms),the proposed PDNet model exhibits substantial advantages across various objective evaluation metrics.Furthermore,in subjective visual evaluations,it aligns more closely with human visual perception characteristics.