Abstract
© 2025 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS)Practice-oriented and general-purpose deep semantic segmentation models are required to be effective in various application scenarios without heavy re-training or with minimum fine-tuning. This calls for the domain generalization ability of models. Vision Foundation Models (VFMs), trained on massive and diverse datasets, have shown impressive generalization capabilities in computer vision tasks. However, how to utilize their generalization ability for remote sensing cross-domain semantic segmentation remains understudied. In this paper, we explore to identify the most suitable VFM for remote sensing images and further enhance its generalization ability in the context of remote sensing image segmentation. Our study begins with a comprehensive generalization ability evaluation of various VFMs and classic CNN or transformer backbone networks under different settings. We discover that the DINO v2 ViT-L outperforms other backbones with frozen parameters or full fine-tuning. Building upon DINO v2, we propose a novel domain generalization framework from both data and deep feature perspectives. This framework incorporates two key modules, the Geospatial Semantic Adapter (GeoSA), and the Batch Style Augmenter (BaSA), which together unlock the potential of DINO v2 in remote sensing image semantic segmentation. GeoSA consists of three core components: enhancer, bridge and extractor. These components work synergistically to extract robust features from the pre-trained DINO v2 and generate multi-scale features adapted to remote sensing images. BaSA employs batch-level data augmentation to reduce reliance on dataset-specific features and promote domain-invariant learning. Extensive experiments across four remote sensing datasets and four domain generalization scenarios for both binary and multi-class semantic segmentation consistently demonstrate our method's superior cross-domain generalization ability and robustness, surpassing advanced domain generalization methods and other VFM fine-tuning methods. Code will be released at https://github.com/mmmll23/GeoSA-BaSA.