Research on multi-scale remote sensing image change detection using Swin Transformer
Due to the complexity of terrain information and the diversity of change detection data,it is difficult to ensure the adequacy and effectiveness of feature extraction in remote sensing images,resulting in low reliability of detection results obtained by change detection methods.Although convolutional neural networks are widely applied in remote sensing change detection due to their advantage of effectively extracting semantic features,the inherent locality of convolutional operations limits the receptive field,making it difficult to capture global spatiotemporal information,thus limiting the modeling of long-range dependencies in the feature space.To capture long-distance semantic dependencies and extract deep global semantic features,a multi-scale feature fusion network SwinChangeNet based on the Swin Transformer was designed.Firstly,SwinChangeNet utilized a twin multi-level Swin Transformer feature encoder for long-range context modeling.Secondly,a feature difference extraction module was introduced into the encoder to calculate the multi-level feature differences before and after changes at different scales,and then the multi-scale feature maps were fused through an adaptive fusion layer.Finally,residual connections and channel attention mechanisms were introduced to decode the fused feature information,thereby generating a complete and accurate change map.Compared with seven classic and cutting-edge change detection methods on two publicly available datasets,CDD and CD-Data_GZ,the proposed model demonstrated the best performance in both datasets.In the CDD dataset,compared with the second-best performing model,the F1 score increased by 1.11%and the accuracy by 2.38%.The proposed model outperformed the others in the CD-Data_GZ dataset.Compared to the second best-performing model,the F1 score,accuracy,and recall increased by 4.78%,4.32%,and 4.09%,respectively,showing significant improvements.The comparative experimental results demonstrated that the proposed model has superior detection performance.The stability and effectiveness of each improved module in the model were also validated through the ablation experiment.In conclusion,the model proposed in this article focused on the task of remote sensing image change detection,introducing the Swin Transformer structure.This enabled the network to more effectively encode local and global features of remote sensing images,resulting in more accurate detection results,while ensuring that the network converges efficiently on datasets with a wide variety of land features.