Aiming at the problems of low accuracy, omissions, and errors of traditional high-resolution remote sensing image building extraction, and most of the existing methods rely on convolutional neural networks, due to the locality of convolutional operations, di-rectly obtaining global context information is full of challenges, inspired by Swin Transformer with strong global modeling capabilities, this paper proposes a method of image building extraction based on Swin Transformer model. This method adopts U-net architecture, uses Swin Transformer block to replace ordinary convolutional extraction context features, and performs local and global semantic fea-ture learning. The model is used to perform experiment on the WHU high-resolution remote sensing image dataset, and the method is compared with U-net, U-net++, and AttentionUnet methods, and the results show that the method can effectively improve the cor-rectness and accuracy of building extraction.
关键词
建筑物提取/Swin/Transformer/遥感影像/U-net
Key words
building extraction/Swin Transformer/remote sensing images/U-net