A self-supervised pre-training scheme for multi-source heterogeneous re-mote sensing image land cover classification
Deep learning has revolutionized the remote sensing image processing techniques over the past few years.Neverthe-less,it is laborious to annotate high quality samples,thus limiting the performance of deep networks because of insufficient su-pervision information.To resolve this contradiction,we investigate the self-supervised pre-training and fine-tuning paradigm for multi-source heterogeneous remote sensing image land cover classification,aiming to relieve the urgent need for manually annotated data.Specifically,the proposed generative feature learning model consists of asymmetric encoder-decoder structure,in which the deep encoder extracts high-level key characteristics contained in multi-source data and task-specific lightweight de-coders are developed to reconstruct original data.To further improve the feature representation capability,the cross-attention layers are utilized to exchange information contained in heterogeneous characteristics,thus learning more complementary infor-mation from multi-source remote sensing data.In fine-tuning stage,the trained encoder is employed as unsupervised feature extractor,and learned features are utilized for land cover classification through the designed lightweight Transformer based classifier.This self-supervised pre-training architecture is capable of learning high-level key features from multi-source hetero-genous remote sensing images,and this process does not require any labeled information,thus relieving the urgent need for la-beled samples.Compared with existing classification paradigms,the proposed multimodal self-supervised pre-training and fine-tuning scheme achieves superior performance for remote sensing image classification.