结合Swin Transformer的多尺度遥感图像变化检测研究

Research on multi-scale remote sensing image change detection using Swin Transformer

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：由于地物信息的复杂性及变化检测数据的多元性,遥感图像特征提取的充分性和有效性难以得到保证,导致变化检测方法获取的检测结果可靠性较低.虽然卷积神经网络(CNN)凭借有效提取语义特征的优势,被广泛应用于遥感领域的变化检测之中,但卷积操作固有的局部性导致感受野受限,无法捕获时空上的全局信息以至于特征空间对中远距离依赖关系的建模受限.为捕获远距离的语义依赖,提取深层全局语义特征,设计了一种基于 Swin Transformer 的多尺度特征融合网络 SwinChangeNet.首先,SwinChangeNet 采用孪生的多级 Swin Transformer特征编码器进行远距离上下文建模;其次,编码器中引入特征差异提取模块,计算不同尺度下变化前后的多级特征差异,再通过自适应融合层将多尺度特征图进行融合;最后,引入残差连接和通道注意力机制对融合后的特征信息进行解码,从而生成完整准确的变化图.在CDD和CD_Data_GZ 2 个公开数据集上分别与 7 种经典和前沿变化检测方法进行比较,CDD数据集中本文模型的性能最优,相比于性能第二的模型,F1 分数提高了1.11%,精确率提高了2.38%.CD_Data_GZ数据集中本文模型的性能最优,相比于性能第二的模型,F1分数、精确率和召回率分别提高了4.78%,4.32%,4.09%,提升幅度较大.对比实验结果证明了该模型具有更好的检测效果.在消融实验中也证实了模型中各个改进模块的稳定性和有效性.本文模型针对遥感图像变化检测任务,引入了Swin Transformer结构,使网络可以对遥感图像的局部特征和全局特征进行更有效地编码,让检测结果更加准确,同时保证网络在地物要素种类繁多的数据集上容易收敛.

外文摘要：Due to the complexity of terrain information and the diversity of change detection data,it is difficult to ensure the adequacy and effectiveness of feature extraction in remote sensing images,resulting in low reliability of detection results obtained by change detection methods.Although convolutional neural networks are widely applied in remote sensing change detection due to their advantage of effectively extracting semantic features,the inherent locality of convolutional operations limits the receptive field,making it difficult to capture global spatiotemporal information,thus limiting the modeling of long-range dependencies in the feature space.To capture long-distance semantic dependencies and extract deep global semantic features,a multi-scale feature fusion network SwinChangeNet based on the Swin Transformer was designed.Firstly,SwinChangeNet utilized a twin multi-level Swin Transformer feature encoder for long-range context modeling.Secondly,a feature difference extraction module was introduced into the encoder to calculate the multi-level feature differences before and after changes at different scales,and then the multi-scale feature maps were fused through an adaptive fusion layer.Finally,residual connections and channel attention mechanisms were introduced to decode the fused feature information,thereby generating a complete and accurate change map.Compared with seven classic and cutting-edge change detection methods on two publicly available datasets,CDD and CD-Data_GZ,the proposed model demonstrated the best performance in both datasets.In the CDD dataset,compared with the second-best performing model,the F1 score increased by 1.11%and the accuracy by 2.38%.The proposed model outperformed the others in the CD-Data_GZ dataset.Compared to the second best-performing model,the F1 score,accuracy,and recall increased by 4.78%,4.32%,and 4.09%,respectively,showing significant improvements.The comparative experimental results demonstrated that the proposed model has superior detection performance.The stability and effectiveness of each improved module in the model were also validated through the ablation experiment.In conclusion,the model proposed in this article focused on the task of remote sensing image change detection,introducing the Swin Transformer structure.This enabled the network to more effectively encode local and global features of remote sensing images,resulting in more accurate detection results,while ensuring that the network converges efficiently on datasets with a wide variety of land features.

外文关键词：

change detectionsiamese networkSwin Transformermulti-scale feature fusionattention mechanismfeature difference extraction

作者：

刘丽、张起凡、白宇昂、黄凯烨

展开 >

作者单位：

华北电力大学计算机系,河北保定 071051

河北省能源电力知识计算重点实验室,河北保定 071051

中国科学院空天信息创新研究院,北京 100080

关键词：

变化检测孪生网络 Swin Transformer 多尺度特征融合注意力机制特征差异提取

基金：

河北省在读研究生创新能力培养资助项目河北省重点研发项目

项目编号：

CXZZSS202416320310103D

出版年：

2024

DOI：

10.11996/JG.j.2095-302X.2024050941

图学学报

中国图学学会

图学学报

CSTPCD北大核心

影响因子：0.73

ISSN：2095-302X

年,卷(期)：2024.45(5)