Infrared and Visible Image Fusion Based on Autoencoder Composed of CNN-transformer
Image fusion model based on autoencoder network gets more attention because it does not need to design fusion rules manually.However,most autoencoder-based fusion networks use two-stream CNNs with the same structure as the encoder,which are unable to extract global features due to the local receptive field of convolutional operations and lack the ability to extract unique features from infrared and visible images.A novel autoencoder-based image fusion network which consist of encoder mod-ule,fusion module and decoder module is constructed in this paper.In the encoder module,the CNN and Transformer are combined to capture the local and global feature of the source images simultaneously.In addition,novel contrast and gradient enhancement fea-ture extraction blocks are designed respectively for infrared and visible images to maintain the information specific to each source images.The feature images obtained by encoder module are concatenated by the fusion module and input to the decoder module to obtain the fused image.Experimental results on three datasets show that the proposed network can better preserve both the clear tar-get and detailed information of infrared and visible images respectively,and outperforms some state-of-the-art methods in both sub-jective and objective evaluation.Meanwhile,the fused image obtained by the proposed network can acquire the highest mean aver-age precision in the target detection which proves that image fusion is beneficial for downstream tasks.