首页|MDSTA: masked diffusion spatio-temporal autoencoder for multimodal remote sensing image classification

MDSTA: masked diffusion spatio-temporal autoencoder for multimodal remote sensing image classification

扫码查看
ABSTRACT Deep learning methods have significantly advanced multimodal remote sensing data classification recently. However, challenges in data acquisition frequently lead to missing modalities, which substantially hinder the performance of these models. Diffusion models have shown tremendous potential in generative tasks, demonstrating a superior ability to model complex data distributions. However, in the remote sensing domain, which incorporates diverse modalities, each modality possesses unique information, and diffusion models may need help to effectively recover critical information when one modality is missing, resulting in unsatisfying performance. To address this, we propose the Masked Diffusion Spatio-Temporal Autoencoder (MDSTA) network for the joint classification of remote sensing data under arbitrary modalities. MDSTA consists of three main components: a Conditional Masked Diffusion Process (CMDP), a Reverse Diffusion Reconstruction Process (RDRP), and an Attention Multi-Layer Perceptron (MLP). The CMDP progressively adds noise to the remote sensing data to prepare for subsequent denoising and reconstruction. The RDRP extracts features from multimodal data through our designed Spatio-Temporal Fusion (STF) Encoder, mapping them into a shared-parameter modality-mixed space to capture multimodal shared features. The Masked Reconstruction (MR) Decoder then utilizes these features for independent reconstruction of each modality, which helps to learn the unique characteristics of each modality. Then, we designed an attention MLP to fuse multimodal classification tokens and obtain the final classification result. Furthermore, we introduce masked training in the conditional masked diffusion block to alleviate memory consumption. Comprehensive experimental findings on four datasets indicate that the proposed MDSTA model surpasses leading models in performance.

Multimodal remote sensingclassificationmissing modalitydiffusion model

Zongqin Yue、Jindong Xu、Ziyi Li、Haihua Xing、Xiang Cheng

展开 >

Yantai University

Hainan Normal University

Peking University

2025

International journal of remote sensing

International journal of remote sensing

ISSN:0143-1161
年,卷(期):2025.46(11/12)
  • 57