基于Swin Transformer与卷积神经网络的高分遥感影像分类

Classification of High-Resolution Remote Sensing Image Based on Swin Transformer and Convolutional Neural Network

何小英 ¹徐伟铭 ¹潘凯祥 ¹王娟 ¹李紫微¹

扫码查看

作者信息

1. 福州大学数字中国研究院(福建),福建福州 350108;福州大学空间数据挖掘与信息共享教育部重点实验室,福建福州 350002;福州大学地理空间信息技术国家地方联合工程研究中心,福建福州 350002
折叠

摘要

针对现有基于深度学习的遥感智能解译方法直接获取全局信息具有挑战性,造成地物边缘模糊、相似类间分类精度低等问题,提出基于Swin Transformer和卷积神经网络的高分遥感图像语义分割模型(SRAU-Net).SRAU-Net以Swin Transformer编码器-解码器框架为基础,采用U-Net形状,提出了以下改进:构造基于Swin Transformer和基于卷积神经网络的双分支编码器,用不同尺度空间细节特征补充具有全局信息的上下文特征,以获得更高的地物分类精度和更清晰的地物边缘;设计特征融合模块,作为双分支编码器的桥梁从通道和空间维度对全局和局部特征进行有效融合,提升对小目标地物的分割精度;添加特征增强模块,利用注意力机制自适应融合来自编码器和解码器的特征,进一步有效聚合空间和语义特征,提升模型对特征的提取效果.结果表明,SRAU-Net能够更好地提取地物的边缘信息,总体分类精度较原始模型提升了2.57百分点,提高了对小尺度地物的分类精度,有效区分如树木和低矮植被等类间相似的遥感地物,总体精度和F1分数分别为92.60%和86.90%,总体效果优于对比模型.

Abstract

It is challenging to directly obtain global information of existing deep learning-based remote sensing intelligent interpretation methods,resulting in blurred object edges and low classification accuracy between similar classes.This study proposes a semantic segmentation model called SRAU-Net based on Swin Transformer and convolutional neural network.SRAU-Net adopts a Swin Transformer encoder-decoder framework with a U-Net shape and introduces several improvements to address the limitations of previous methods.First,Swin Transformer and convolutional neural network are used to construct a dual-branch encoder,which effectively captures spatial details with different scales and complements the context features,resulting in higher classification accuracy and sharper object edges.Second,a feature fusion module is designed as a bridge for the dual-branch encoder.This module efficiently fuses global and local features in channel and spatial dimensions,improving the segmentation accuracy for small target objects.Moreover,the proposed SRAU-Net model incorporates a feature enhancement module that utilizes attention mechanisms to adaptively fuse features from the encoder and decoder and enhances the aggregation of spatial and semantic features,further improving the ability of the model to extract features from remote sensing images.The effectiveness of the proposed SRAU-Net model is demonstrated using the ISPRS Vaihingen dataset for land cover classification.The results show that SRAU-Net outperforms other models in terms of overall accuracy and F1 score,achieving 92.06%and 86.90%,respectively.Notably,the SRAU-Net model excels in extracting object edge information and accurately classifying small-scale regions,with an improvement of 2.57 percentage points in the overall classification accuracy compared with the original model.Furthermore,it effectively distinguishes remote sensing objects with similar characteristics,such as trees and low vegetation.

关键词

高分辨率遥感影像/卷积神经网络/Swin/Transformer/特征融合/语义分割

Key words

high-resolution remote sensing image/convolutional neural network/Swin Transformer/feature fusion/semantic segmentation

引用本文复制引用

基金项目

国家自然科学基金(41801324)

福建省科技厅引导性项目(2017Y0055)

福建省科技厅引导性项目(2022H0009)

教育部产学合作协同育人项目(202101119001)

出版年

2024

激光与光电子学进展

中国科学院上海光学精密机械研究所

激光与光电子学进展

CSTPCDCSCD北大核心

影响因子：1.153

ISSN：1006-4125

参考文献量9

段落导航