A comprehensive survey and assumption of remote sensing foundation modal
In recent years,remote sensing intelligent interpretation technologies have advanced rapidly,but most established models are task oriented.Therefore,generalizing them to different tasks is difficult,and considerable amounts of resources are wasted.The foundation model is a straightforward approach that has recently attracted considerable interest in the field of remote sensing.Although many works have achieved remarkable results in some tasks for perception recognition and cognitive prediction by using remote sensing single-temporal or multitemporal data,a comprehensive review that provides a systematic overview of the remote sensing foundation model is lacking.Thus,this paper begins by summarizing developments in research on existing remote sensing foundation models from the perspectives of data,methods,and applications.Then,after analyzing the current situation's limits,we proposed a novel general predictive foundation model.Finally,some essential research areas were highlighted,and past achievements were linked with the future possibilities of remote sensing foundation model.Existing remote sensing foundation models were categorized into three groups according to the data types used(single-temporal/multitemporal)and the tasks involved(perceptual recognition/cognitive prediction):the foundation model of perceptual recognition based on single-temporal data,the foundation model of perceptual recognition based on multitemporal data,and the foundation model of cognitive prediction based on multitemporal data.According to the different self-supervised learning methods adopted,we divided the existing foundation models of perceptual recognition based on single-temporal data into those based on contrastive learning and those based on generative learning.According to the number of tasks,the foundation model of perceptual recognition based on multitemporal data was divided into a single-task-oriented foundation model and a multitask-oriented foundation model.According to different model architectures,the cognitive prediction foundation models based on multitemporal data were divided into transformer-based and graph network-based foundation models.In accordance with the aforementioned categorization,we described the current state of each type of remote sensing foundation models and summarized their data,methods,and application restrictions.Based on the summary and analysis of the existing remote sensing foundation models,a novel general predictive foundation model assumption was proposed.The information pipeline for multidomain or temporal data input and multitime or spatial scale task output can be opened up by extracting stable and generalized time-series hyper-pixel features.This approach enabled the accurate cognitive prediction of the future state.Tens of millions of multiplatform,multitype,multimodal,and multitemporal data were included.By combining the benefits of the transformer model and the graph network,a new foundation model architecture was created,which increased the model's capacity and enhanced generalization while predicting multitarget interactions in large remote sensing scenes over the long term.In terms of application,the general predictive foundation model can be applied to diverse cognitive prediction tasks with multiple spatial and time scales.Under this assumption,we proposed four exploratory directions:multidomain time series data representation,stable feature extraction,object-environment interaction modeling,and multitask interaction reasoning,aiming to provide a reference for researchers exploring remote sensing foundation models.In general,foundation models with generalization ability are crucial to development of remote sensing intelligent interpretation.We provided an overview of current advances in this field by collating the current state of research on remote sensing foundation models.By analyzing the limitations of current remote sensing foundation models in terms of data,methods,and applications,we proposed a novel general predictive foundation model assumption and further clarified four exploratory directions that urgently need breakthroughs under this idea.The follow-up work will make specific and important technological breakthroughs in multidomain time series data representation,stable feature extraction,object-environment interaction modeling,and multitask interaction reasoning.We explored a general remote sensing foundation model integrating perception recognition and cognitive prediction into a single architecture.
remote sensing intelligent interpretationremote sensing foundation modelsgeneral predictionmulti temporal datamulti-task