Semantic Segmentation of Dual-Source Remote Sensing Images Based on Gated Attention and Multiscale Residual Fusion
The semantic segmentation of remote sensing images is a crucial step in the analysis of geographic-object-based remote sensing images.Combining remote sensing image data with elevation data effectively enhances feature complementarity,thereby improving pixel-level segmentation accuracy.This study proposes a dual-source remote sensing image semantic segmentation model,STAM-SegNet,that leverages the Swin Transformer backbone network to extract multiscale features.The proposed model integrates an adaptive gating attention mechanism and a multiscale residual fusion strategy.The adaptive gated attention mechanism includes gated channel attention and gated spatial attention mechanisms.Gated channel attention enhances the correlation between dual-source data features through competition/cooperation mechanisms,effectively extracting complementary features of dual-source data.In contrast,gated spatial attention uses spatial contextual information to dynamically filter out high-level semantic features and select accurate detail features.The multiscale feature residual fusion strategy captures multiscale contextual information via multiscale refinement and residual structure,thereby emphasizing detailed features,such as shadows and boundaries,and improving the model's training speed.Experiments conducted on the Vaihingen and Potsdam datasets demonstrate that the proposed model achieved an average F1-score of 89.66%and 92.75%,respectively,surpassing networks such as DeepLabV3+,UperNet,DANet,TransUNet,and Swin-UNet in terms of segmentation accuracy.