融合CNN与Transformer的高分辨率遥感影像建筑物双流提取模型

Integration of CNN and transformer for high-resolution remote sensing image building extraction:A dual-stream network

刘宇鑫 ¹孟瑜 ²邓毓弸 ²陈静波 ²刘帝佑²

扫码查看

作者信息

1. 中国科学院空天信息创新研究院国家遥感应用工程技术研发中心,北京 100094;中国科学院大学电子电气与通信工程学院,北京 100049
2. 中国科学院空天信息创新研究院国家遥感应用工程技术研发中心,北京 100094
折叠

摘要

卷积神经网络CNN(Convolutional Neural Network)和Transformer已被广泛应用于高分辨率遥感影像的建筑物提取任务.然而,CNN在建模长距离空间依赖时仍存在挑战,导致提取的建筑物存在内部空洞问题;而Transformer在捕捉空间局部细节特征上存在不足,容易导致建筑物边缘模糊及小型建筑物的漏检.为解决上述问题,本文提出了一种新型的双流网络模型用于高分辨率遥感影像的建筑物提取,名为ILGS-Net(Network for the Integration of Local and Global Features Stream).该模型将 CNN 与 Transformer相结合,采用多层级的局部—全局特征融合模块,实现了对建筑物的局部细节特征与全局上下文特征的高效融合.同时,在目标函数中引入边缘损失函数约束模型训练,提高了建筑物边界的定位精度.在3个高分辨率建筑物数据集上的实验结果显示,所提出方法的交并比均高于本文所对比的最佳方法,平均提高了 1％.

Abstract

Convolutional Neural Networks(CNNs)and transformers have emerged as pivotal tools in the field of building extraction tasks in high-resolution remote sensing images.Although these techniques have found widespread applications,challenges persist for CNNs in effectively modeling long-range spatial dependencies,frequently leading to complications,such as the emergence of internal holes in the extracted building structures.Conversely,transformers exhibit limitations in capturing spatial local details,potentially resulting in the production of blurry building edges and the oversight of smaller structures.In response to these challenges,this study presents an innovative dual-stream network model tailored for building extraction in high-resolution remote sensing images,called the network for the integration of local and global features stream(ILGS-Net).ILGS-Net is designed to capitalize on the strengths of CNNs and transformers.The model incorporates multilevel local-global feature fusion modules to seamlessly blend the intricate local details and expansive global context features of buildings.An edge loss function is integrated into the objective function,contributing to the refinement of building boundary localization precision.The proposed ILGS-Net endeavors to address the shortcomings of existing methodologies by efficiently combining the unique attributes of CNNs and transformers.The multilevel local-global feature fusion modules play a pivotal role in striking a harmonious balance between capturing the fine-grained local details and incorporating the broader global context features of buildings.Simultaneously,the inclusion of an edge loss function serves as a guiding mechanism in model training,augmenting the precision of building boundary localization.Extensive experiments conducted across three high-resolution building datasets consistently demonstrate the superior performance of the proposed ILGS-Net compared with benchmark methods described in this paper.Notably,the proposed method achieves a remarkable increase of an average of 1％in the intersection over union across all three datasets.In conclusion,ILGS-Net emerges as a groundbreaking dual-stream network model that is specifically designed for building extraction in high-resolution remote sensing images.By seamlessly integrating CNNs and transformers,along with the implementation of multilevel local-global feature fusion and the inclusion of an edge loss function,the model adeptly addresses challenges associated with spatial dependencies and local details,resulting in a marked improvement in the accuracy of building extraction.The experimental results underscore the efficacy of the proposed method,making it a promising and influential approach for achieving high-precision building extraction in high-resolution remote sensing images.The confluence of advanced methodologies and innovative techniques within ILGS-Net marks a significant step forward in the field of remote sensing image analysis.As technology continues to evolve,ILGS-Net represents a pivotal contribution that exhibits promise for further advancements in building extraction accuracy,providing a solid foundation for continuous research and application in the field of high-resolution remote sensing imagery analysis.In the future,the success of ILGS-Net prompts further exploration and research.Investigating the potential of similar integrative approaches in other remote sensing tasks exhibits promise.In addition,refining and expanding the current model architecture to accommodate varying scales and complexities of urban landscapes are a logical progression.Future work should focus on translating these advancements into tangible benefits for decision-makers and stakeholders in urban development and disaster response.

关键词

遥感/建筑物提取/深度学习/双流网络/边缘损失/局部和全局特征融合

Key words

remote sensing/building extraction/deep learning/dual-stream network/edge loss/local-global feature fusion

引用本文复制引用

出版年

2024

遥感学报

中国地理学会环境遥感分会中国科学院遥感应用研究所

遥感学报

CSTPCD北大核心

影响因子：2.921

ISSN：1007-4619

段落导航