Infrared and Visible Image Fusion Based on Multi-granularity Cross-modal Feature Enhancement
This work proposes a Transformer and CNN based image fusion algorithm to address the issues of insufficient cross-modal feature integration in infrared and visible image fusion. To fully extract deep features of global context, a deep feature extraction module with Transformer blocks is designed. The multi-granularity global context features extracted by the Transformer blocks are fed into a cross-modal feature enhancement module ( CFEB) to fully integrate the dual-modality deep features in a top-down manner. The integrated fused features are connected with the dual-modality features in the channel dimension to reconstruct the fused image. A large number of qualitative and quantitative experimental results on public MSRS dataset show that the proposed method can fully integrate complementary information from infrared and visible images, achieving significant image fusion effects.