ISPRS journal of photogrammetry and remote sensing2025,Vol.230Issue(Dec.) :126-146.DOI:10.1016/j.isprsjprs.2025.09.004

Domain generalization for semantic segmentation of remote sensing images via vision foundation model fine-tuning

Luo M. Zan Y. Ji S. Khoshelham K.
ISPRS journal of photogrammetry and remote sensing2025,Vol.230Issue(Dec.) :126-146.DOI:10.1016/j.isprsjprs.2025.09.004

Domain generalization for semantic segmentation of remote sensing images via vision foundation model fine-tuning

Luo M. 1Zan Y. 1Ji S. 1Khoshelham K.2
扫码查看

作者信息

  • 1. School of Remote Sensing and Information Engineering Wuhan University
  • 2. Department of Infrastructure Engineering The University of Melbourne
  • 折叠

Abstract

© 2025 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS)Practice-oriented and general-purpose deep semantic segmentation models are required to be effective in various application scenarios without heavy re-training or with minimum fine-tuning. This calls for the domain generalization ability of models. Vision Foundation Models (VFMs), trained on massive and diverse datasets, have shown impressive generalization capabilities in computer vision tasks. However, how to utilize their generalization ability for remote sensing cross-domain semantic segmentation remains understudied. In this paper, we explore to identify the most suitable VFM for remote sensing images and further enhance its generalization ability in the context of remote sensing image segmentation. Our study begins with a comprehensive generalization ability evaluation of various VFMs and classic CNN or transformer backbone networks under different settings. We discover that the DINO v2 ViT-L outperforms other backbones with frozen parameters or full fine-tuning. Building upon DINO v2, we propose a novel domain generalization framework from both data and deep feature perspectives. This framework incorporates two key modules, the Geospatial Semantic Adapter (GeoSA), and the Batch Style Augmenter (BaSA), which together unlock the potential of DINO v2 in remote sensing image semantic segmentation. GeoSA consists of three core components: enhancer, bridge and extractor. These components work synergistically to extract robust features from the pre-trained DINO v2 and generate multi-scale features adapted to remote sensing images. BaSA employs batch-level data augmentation to reduce reliance on dataset-specific features and promote domain-invariant learning. Extensive experiments across four remote sensing datasets and four domain generalization scenarios for both binary and multi-class semantic segmentation consistently demonstrate our method's superior cross-domain generalization ability and robustness, surpassing advanced domain generalization methods and other VFM fine-tuning methods. Code will be released at https://github.com/mmmll23/GeoSA-BaSA.

Key words

Domain generalization/Fine-tuning/Remote sensing/Semantic segmentation/Visual foundation model

引用本文复制引用

出版年

2025
ISPRS journal of photogrammetry and remote sensing

ISPRS journal of photogrammetry and remote sensing

ISSN:0924-2716
参考文献量77
段落导航相关论文