首页|基于预训练固定参数和深度特征调制的红外与可见光图像融合网络

基于预训练固定参数和深度特征调制的红外与可见光图像融合网络

扫码查看
为了更好地利用红外与可见光图像中互补的图像信息,得到符合人眼感知特性的融合图像,该文采用两阶段训练策略提出一种基于预训练固定参数和深度特征调制的红外与可见光图像融合网络(PDNet).具体地,在自监督预训练阶段,以大量清晰的自然图像分别作为U型网络结构(UNet)的输入和输出,采用自编码器技术完成预训练.所获得编码器模块能有效提取输入图像的多尺度深度特征功能,而解码器模块则能将其重构为与输入图像差异极小的输出图像;在无监督融合训练阶段,将预训练编码器和解码器模块的网络参数保持固定不变,而在两者之间新增包含Transformer结构的融合模块.其中,Transformer结构中的多头自注意力机制能对编码器分别从红外和可见光图像提取到的深度特征权重进行合理分配,从而在多个尺度上将两者融合调制到自然图像深度特征的流型空间上来,进而保证融合特征经解码器重构后所获得融合图像的视觉感知效果.大量实验表明:与当前主流的融合模型(算法)相比,所提PDNet模型在多个客观评价指标方面具有显著优势,而在主观视觉评价上,也更符合人眼视觉感知特点.
A Fusion Network for Infrared and Visible Images Based on Pre-trained Fixed Parameters and Deep Feature Modulation
To better leverage complementary image information from infrared and visible light images and generate fused images that align with human perception characteristics,a two-stage training strategy is proposed to obtain a novel infrared-visible image fusion Network based on pre-trained fixed Parameters and Deep feature modulation(PDNet).Specifically,in the self-supervised pre-training stage,a substantial dataset of clear natural images is employed as both inputs and outputs for the UNet backbone network,and pre-training is accomplished with autoencoder technology.As such,the resulting encoder module can proficiently extract multi-scale depth features from the input image,while the decoder module can faithfully reconstruct it into an output image with minimal deviation from the input.In the unsupervised fusion training stage,the pre-trained encoder and decoder module parameters remain fixed,and a fusion module featuring a Transformer structure is introduced between them.Within the Transformer structure,the multi-head self-attention mechanism allocates deep feature weights,extracted by the encoder from both infrared and visible light images,in a rational manner.This process fuses and modulates the deep image features at various scales into the manifold space of deep features of clear natural image,thereby ensuring the visual perception quality of the fused image after reconstruction by the decoder.Extensive experimental results demonstrate that,in comparison to current mainstream fusion models(algorithms),the proposed PDNet model exhibits substantial advantages across various objective evaluation metrics.Furthermore,in subjective visual evaluations,it aligns more closely with human visual perception characteristics.

Infrared and visible imagesImage fusionSelf supervised pre-trainingUnsupervised fusion trainingFixed parametersDeep feature modulation

徐少平、周常飞、肖建、陶武勇、戴田宇

展开 >

南昌大学数学与计算机学院 南昌 330031

红外与可见光图像 图像融合 自监督预训练 无监督融合训练 固定参数 深度特征调制

国家自然科学基金

62162043

2024

电子与信息学报
中国科学院电子学研究所 国家自然科学基金委员会信息科学部

电子与信息学报

CSTPCD北大核心
影响因子:1.302
ISSN:1009-5896
年,卷(期):2024.46(8)