首页期刊导航|ISPRS journal of photogrammetry and remote sensing
期刊信息/Journal information
ISPRS journal of photogrammetry and remote sensing
Elsevier
ISPRS journal of photogrammetry and remote sensing

Elsevier

双月刊

0924-2716

ISPRS journal of photogrammetry and remote sensing/Journal ISPRS journal of photogrammetry and remote sensingSCIAHCIISTPEI
正式出版
收录年代

    Text-Guided Coarse-to-Fine Fusion Network for robust remote sensing visual question answering

    Zhao Z.Zhou C.Zhang Y.Li C....
    1-17页
    查看更多>>摘要:© 2025 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS)Remote Sensing Visual Question Answering (RSVQA) has gained significant research interest. However, current RSVQA methods are limited by the imaging mechanisms of optical sensors, particularly under challenging conditions such as cloud-covered and low-light scenarios. Given the all-time and all-weather imaging capabilities of Synthetic Aperture Radar (SAR), it is crucial to investigate the integration of optical-SAR images to improve RSVQA performance. In this work, we propose a Text-Guided Coarse-to-Fine Fusion Network (TGFNet), which leverages the semantic relationships between question text and multi-source images to guide the network toward complementary fusion at the feature level. Specifically, we develop a Text-Guided Coarse-to-Fine Attention Refinement (CFAR) module to focus on key areas related to the question in complex remote sensing images. This module progressively directs attention from broad areas to finer details through key region routing, enhancing the model's ability to focus on relevant regions. Furthermore, we propose an Adaptive Multi-Expert Fusion (AMEF) module that dynamically integrates different experts, enabling the adaptive fusion of optical and SAR features. In addition, we create the first large-scale benchmark dataset for evaluating optical-SAR RSVQA methods, comprising 7,108 well-aligned optical-SAR image pairs and 1,131,730 well-labeled question–answer pairs across 16 diverse question types, including complex relational reasoning questions. Extensive experiments on the proposed dataset demonstrate that our TGFNet effectively integrates complementary information from optical and SAR images, significantly improving the model's performance in challenging scenarios. The dataset is available at: https://github.com/mmic-lcl/.

    Contextual boundary-aware network for semantic segmentation of complex land transportation point cloud scenes

    Chen Y.Xia J.Zou X.Xiao Z....
    18-31页
    查看更多>>摘要:© 2025 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS)Semantic segmentation of land transportation scenes is critical for infrastructure maintenance and the advancement of intelligent transportation systems. Unlike traditional large-scale scenes, land transportation environments present intricate structural dependencies among infrastructure elements and pronounced class imbalance. To address these challenges, we propose a Gaussian-enhanced positional encoding block that leverages the Gaussian function's intrinsic smoothing and reweighting properties to project relative positional information into a higher-dimensional space. By fusing this enhanced representation with the original positional encoding, the model gains a more nuanced understanding of spatial dependencies among infrastructures, thereby improving its capacity for semantic segmentation in complex land transportation scenes. Furthermore, we introduce the Multi-Context Interaction Module (MCIM) into the backbone network, varying the number of MCIMs across different network levels to strengthen inter-layer context interactions and mitigate error accumulation. To mitigate class imbalance and excessive object adhesion within the scene, we incorporate a boundary-aware class-balanced (BCB) hybrid loss function. Comprehensive experiments on three distinct land transportation datasets validate the effectiveness of our approach, with comparative analyses against state-of-the-art methods demonstrating its consistent superiority. Specifically, our method attains the highest mIoU (91.8%) and OA (96.7%) on the high-speed rail dataset ExpressRail, the highest mIoU (73.3%) on the traditional railway dataset SNCF, and the highest mF1-score (87.4%) on the urban road dataset Pairs3D. Codes are uploaded at: https://github.com/Kange7/CoBa.

    Temporal downscaling meteorological variables to unseen moments: Continuous temporal downscaling via Multi-source Spatial–temporal-wavelet feature Fusion and Time-Continuous Manifold

    Gao S.Lin L.Zhang Z.Wang J....
    32-54页
    查看更多>>摘要:© 2025 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS)Accurate modeling of meteorological variables with high temporal resolution is crucial for simulations and decision-making in aviation, aerospace, and other engineering sectors. Conventional meteorological products typically have temporal resolutions exceeding one hour, hindering the characterization of short-term nonlinear evolutions in meteorological variables. Current temporal downscaling methods encounter challenges of insufficient multi-source data fusion, limited extrapolation capabilities of data distributions, and inadequate learning of spatiotemporal dependencies, leading to low modeling accuracy and difficulties in modeling meteorological environments with higher temporal resolutions than those in the training data. To address these issues, this study proposes MSF-TCMA (Multi-source Spatial–temporal-wavelet feature Fusion and Time-Continuous Manifold-based Algorithm) for continuous temporal downscaling. The algorithm introduces multiscale deep-wavelet feature extraction branch for integrating spatial dependence and the cross-modal spatiotemporal information fusing branch for fusing multi-source information and learning temporal dependence. The time-continuous manifold sampling branch is used to solve the problem of data distribution extrapolation. Finally, the algorithm's continuous downscaling performance is optimized by employing multi-moment weighted meteorological state estimation-energy change deduction joint loss. Two regional case studies demonstrate that MSF-TCMA achieves modeling errors of less than 0.65 K for 2-meter temperature, less than 36.24 Pa for surface pressure, and less than 0.38 m/s for wind speed over a 6-hour time interval, with errors reduced by 3.99-99.64% compared to the comparison methods. Furthermore, two engineering experiments demonstrate that the method realizes continuous downscaling of multiple moments in a time interval (including for unseen moments during the algorithm training phase), and downscaling prediction of future meteorological states based on GFS forecast data. This study provides a new paradigm for high-precision and high-temporal resolution reconstruction of meteorological data, which is of great application value for optimization and risk control of complex engineering activities. The code is available at: https://github.com/shermo1415/MSF-TCMA/.

    BUD: Band-limited uncalibrated detector of environmental changes for InSAR monitoring framework

    Costa G.Monti Guarnieri A.V.Manzoni M.Parizzi A....
    55-72页
    查看更多>>摘要:© 2025 The AuthorsSynthetic Aperture Radar (SAR) is used in a wide variety of fields, such as monitoring failures and measuring infrastructure health. Detecting spatio-temporal changes in the observed scene is of paramount importance, particularly considering the prevention of hazards. In this paper, we propose a novel nonparametric method called Band-limited Uncalibrated Detector (BUD) for change detection using InSAR coherence. BUD is a flexible, robust, and responsive tool designed for monitoring applications. It directly inspects observed data, making inferences without relying on strong theoretical assumptions or requiring calibration with known stable targets. It achieves this by applying a nonparametric statistical hypothesis test to multi-temporal InSAR coherence samples, specifically looking for differences in their statistical distributions. After outlining the theoretical principles of our proposed algorithm, we present a synthetic performance analysis comparing BUD with various state-of-the-art methods. Then, BUD is applied to two challenging real-world scenarios crucial for monitoring applications: an open-pit mining site, known for frequent and composite environmental changes, and an urban area, which typically experiences infrequent changes demanding highly responsive change detection methods. In both cases, we provide a comparison with other leading methods. Finally, we cross-validate BUD in the open-pit mine scenario by intersecting analysis results from three different InSAR datasets covering the same area of interest, featuring diverse acquisition geometries and operational bandwidths (X-Band and C-Band), proposing a novel way to interpret InSAR data. The algorithm's final validation is achieved using available ground truth data in the urban scenario.

    GSTM-SCD: Graph-enhanced spatio-temporal state space model for semantic change detection in multi-temporal remote sensing images

    Liu X.Dai C.Ding L.Zhang Z....
    73-91页
    查看更多>>摘要:© 2025 The AuthorsMulti-temporal Semantic change detection (MT-SCD) provides crucial information for a wide variety of applications, including land use monitoring, urban planning, and sustainable development. However, previous deep learning-based SCD approaches exhibit limitations in time-series semantic change analysis, particularly in understanding Earth surface change dynamics. Specifically, literature methods typically employ Siamese networks to exploit the multi-temporal information. This hinders temporal interactions, failing to comprehensively model spatio-temporal dependencies, causing substantial classification and detection errors in complex scenes. Another key issue is the neglect of temporal transitivity consistency, resulting in predictions that contradict the multi-temporal change chain rules inherent to MT-SCD. Furthermore, literature approaches do not consider dynamic adaptation to the number of observation dates, failing to process time-series remote sensing images (RSIs) with arbitrary time steps. To address these challenges, we propose a graph-enhanced spatio-temporal Mamba (GSTM-SCD) for MT-SCD (including both bi-temporal SCD and time-series SCD). It employs vision state space models to capture the spatio-temporal dependencies in multi-temporal RSIs, and leverages graph modeling to enhance inter-temporal dependencies. First, we employ a single-branch Mamba encoder to efficiently exploit multi-temporal semantics and construct a spatio-temporal graph optimization mechanism to facilitate interactions between multi-temporal RSIs, while maintaining spatial continuity of feature representations. Second, we introduce a bidirectional three-dimensional change scanning strategy to learn underlying semantic change patterns. Finally, a novel loss function tailored for time-series SCD is proposed, which regularizes the multi-temporal topological relationships within data. The resulting approach, GSTM-SCD, demonstrates significant accuracy improvements compared to the state-of-the-art (SOTA) methods. Experiments conducted on four open benchmark datasets (SECOND, Landsat-SCD, WUSU and DynamicEarthNet) demonstrate that our method surpasses the SOTA by 0.53%, 1.66%, 9.32% and 0.78% in SeK, respectively. Moreover, it significantly reduces computational costs in comparison with recent SOTA methods. The associated codes is made available at: https://github.com/liuxuanguang/GSTM-SCD.

    Global urban high-resolution scene classification via uncertainty-aware domain generalization

    Yi J.Zhong Y.Yang R.Liu Y....
    92-108页
    查看更多>>摘要:© 2025 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS)Global urban scene classification is a crucial technology for global land use mapping, holding significant importance in driving urban intelligence forward. When applying datasets constructed from urban scenes on a global scale, there are two serious problems. Due to cultural, economic, and other factors, style differences exist in scenes across different cities, posing challenges for model generalization. Additionally, urban scene samples often follows a long-tailed distribution, complicating the identification of tail categories with small sample volumes and impairing performance under domain generalization settings. To tackle these problems, the Uncertainty-aware Domain Generalization urban scene classification (UADG) framework is constructed. For mitigating city-related style difference among global cities, a city-related whitening is proposed, utilizing whitening operations to separate city unrelated content features and adaptively preserving city-related information hidden in style features, rather than directly removing style information, thus aiding in more robust representations. To tackle the phenomenon of significant accuracy decline in tail classes during domain generalization, estimated uncertainty is utilized to guide the mixture of experts, and reasonable expert assignment is conducted for hard samples to balance the model bias. To evaluate the proposed UADG framework under practical scenario, the Domain Generalized Urban Scene (DGUS) dataset is curated for validation, with a training set comprising 42 classes of samples from 34 provincial capitals in China, and test samples selected from representative cities across six continents. Extensive experiments have demonstrated that our method achieves state-of-the-art performance, notably outperforming the baseline GAMMA by 9.79% and 7.42% with average OA and AA metric on the unseen domains of DGUS, respectively. UADG greatly enhancing the automation of global urban land use mapping.

    T-graph: Enhancing sparse-view camera pose estimation by pairwise translation graph

    Xian Q.van der Zwaag B.J.Huang Y.Jiao W....
    109-125页
    查看更多>>摘要:© 2025 The AuthorsSparse-view camera pose estimation, which aims to recover 6-Degree-of-Freedom (6-DoF) poses from a limited number of unordered multi-view images, is fundamental yet challenging in remote sensing. Learning-based methods offer greater robustness than traditional Structure-from-Motion (SfM) pipelines by leveraging dense high-dimensional features and implicit learning, rather than sparse keypoints and limited geometric constraints. However, they often neglect pairwise translation cues between views, resulting in suboptimal performance in sparse-view scenarios. To address this limitation, we introduce T-Graph, a lightweight, plug-and-play module to enhance camera pose estimation in sparse-view settings. T-graph takes paired image features as input and maps them through a Multilayer Perceptron (MLP). It then constructs a fully connected translation graph, where nodes represent cameras and edges encode their translation relationships. It can be seamlessly integrated into most existing learning-based models as an additional branch in parallel with the original prediction, maintaining efficiency and ease of use. Furthermore, we introduce two pairwise translation representations, relative-t and pair-t, formulated under different local coordinate systems. While relative-t captures intuitive spatial relationships, pair-t offers a rotation-disentangled alternative. The two representations contribute to enhanced adaptability across diverse application scenarios, further improving our module's robustness. We further propose an indicator termed the Camera Axis Dispersion Ratio (CADR) to quantitatively assess which type of pairwise translation representation is better suited for a given camera configuration in a dataset. Extensive experiments on three representative methods (RelPose++, Forge and 8Pt-ViT) using public datasets (CO3D and IMC PhotoTourism) validate both the effectiveness and generalizability of T-Graph. The results demonstrate consistent improvements across various metrics, notably camera center accuracy, which improves up to 6% across 2 to 8 viewpoints.

    Domain generalization for semantic segmentation of remote sensing images via vision foundation model fine-tuning

    Luo M.Zan Y.Ji S.Khoshelham K....
    126-146页
    查看更多>>摘要:© 2025 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS)Practice-oriented and general-purpose deep semantic segmentation models are required to be effective in various application scenarios without heavy re-training or with minimum fine-tuning. This calls for the domain generalization ability of models. Vision Foundation Models (VFMs), trained on massive and diverse datasets, have shown impressive generalization capabilities in computer vision tasks. However, how to utilize their generalization ability for remote sensing cross-domain semantic segmentation remains understudied. In this paper, we explore to identify the most suitable VFM for remote sensing images and further enhance its generalization ability in the context of remote sensing image segmentation. Our study begins with a comprehensive generalization ability evaluation of various VFMs and classic CNN or transformer backbone networks under different settings. We discover that the DINO v2 ViT-L outperforms other backbones with frozen parameters or full fine-tuning. Building upon DINO v2, we propose a novel domain generalization framework from both data and deep feature perspectives. This framework incorporates two key modules, the Geospatial Semantic Adapter (GeoSA), and the Batch Style Augmenter (BaSA), which together unlock the potential of DINO v2 in remote sensing image semantic segmentation. GeoSA consists of three core components: enhancer, bridge and extractor. These components work synergistically to extract robust features from the pre-trained DINO v2 and generate multi-scale features adapted to remote sensing images. BaSA employs batch-level data augmentation to reduce reliance on dataset-specific features and promote domain-invariant learning. Extensive experiments across four remote sensing datasets and four domain generalization scenarios for both binary and multi-class semantic segmentation consistently demonstrate our method's superior cross-domain generalization ability and robustness, surpassing advanced domain generalization methods and other VFM fine-tuning methods. Code will be released at https://github.com/mmmll23/GeoSA-BaSA.

    A landsat-based burned area atlas (2000–2023) for the Niassa Special Reserve, Mozambique using U-Net deep learning

    Dias C.R.G.Neves A.K.Silva J.M.N.Pereira J.M.C....
    147-169页
    查看更多>>摘要:© 2025 The Author(s)Savanna burning plays a key ecological role in miombo woodlands, influencing vegetation regeneration, biodiversity, and ecosystem structure. This study provides a comprehensive fire atlas and spatiotemporal assessment of fire activity from 2000 to 2023, in the Niassa Special Reserve (NSR), northern Mozambique, a key protected area is sub-Saharan Africa. Using medium-resolution satellite imagery and a Deep Learning classification approach (U-Net), we mapped annual burned areas and analysed spatial and temporal patterns of burning, including recurrence and seasonality. The results indicate a mean fire return interval of 2.8 years, with distinct differences between the Early Dry Season (EDS) and Late Dry Season (LDS): fire recurrence was as frequent as 1.9 years in the LDS, while EDS intervals extended up to 30 years. Fire activity was most intense in central and eastern lowlands, while higher elevations such as Mount Mecula showed lower fire occurrence. The classification model demonstrated strong performance, with Dice Coefficients ranging from 91.4 % to 94.6 %. The resulting atlas offers valuable insights for adaptive fire management, biodiversity conservation, and climate resilience in the NSR and similar savanna ecosystems.

    Generation of 30 m resolution monthly burned area product in Africa based on Landsat 8/9 and Sentinel-2 data

    Huang S.Long T.Zhang Z.He G....
    170-191页
    查看更多>>摘要:© 2025 The Author(s)Accurate burned area (BA) detection is critical for understanding fire dynamics and assessing ecological impacts. However, the existing continental-scale BA products are mainly at low and medium spatial resolution, which is difficult to detect small or fragmented fires, resulting in significant underestimation of BA detection. In this study, we propose a novel high-resolution (30 m) monthly BA mapping approach by integrating Sentinel-2 and Landsat 8/9 images on the Google Earth Engine (GEE) platform, and generate the product of African Monthly Burned Area in 2019 (AMBA2019). The workflow initiates with a stratified random sampling scheme that intersects MCD12Q1 land-cover classifications with GFED5 fire-frequency zones, ensuring spatially representative training sample distributions across diverse ecosystems and fire regimes. A multi-dimensional feature stack for BA detection is constructed encompassing fire behavior indicators, vegetation dynamics, moisture stress metrics, and temporal-difference signatures, which includes the newly developed time-aware spectral indices. A two-stage Random Forest classification framework, trained on stratified sample points and multiple BA detection features, is subsequently applied to identify candidate burned scars. To further refine the preliminary outputs of the Random Forest model, threshold testing, spatial filtering, and the region-growing algorithm are applied to reduce false positives and improve detection of small fires typically missed by coarse-resolution BA products. Validation against the Burning Area Reference Database (BARD) shows that AMBA2019 achieves an overall accuracy of 96.38% and 94.69%, respectively, with the lowest commission and omission errors compared with three widely used BA products (MCD64A1, FireCCI51, and FireCCISFD20). This research offers a robust foundation for quantifying fire-induced carbon emissions and enhancing climate modeling capabilities in Africa.