Infrared and visible image fusion based on transformer and spatial attention model
Currently,the applications of convolutional neural networks to the task of fusing infrared and visible images have achieved better fusion results.Many of these methods are based on network models with self-encoder architec-tures,which are trained in a self-supervised methods and require the use of hand-designed fusion strategies to fuse fea-tures in the testing phase.However,existing methods based on self-encoder networks rarely make full use of both shallow and deep features,and convolutional neural networks are limited by the receptive field,making it more difficult to establish long-range dependencies and thus losing global information.In contrast,Transformer,with the help of self-attention mechanism,can establish long-range dependencies and effectively obtain global contextual information.In terms of fusion strategies,most of the methods are designed in a crude way and do not specifically consider the charac-teristics of different modal images.Therefore,CNN and Transformer are combined in the encoder to enable the en-coder to extract more comprehensive features.And the attention model is applied to the fusion strategy to optimize the features in a more refined way.The experimental results show that the fusion algorithm achieves excellent results in both subjective and objective evaluations compared to other image fusion algorithms.