Joint classification of hyperspectral and LiDAR data based on inter-modality match learning
Several excellent models for joint classification of hyperspectral image and LiDAR data,which were designed on the basis of supervised learning methods such as convolutional neural networks,have been developed in recent years.Their classification performance depends largely on the quantity and quality of training samples.However,when the distribution of ground objects becomes increasingly complex and the resolutions of remote sensing images grow increasingly high,obtaining high-quality labels only with limited cost and manpower is difficult.Therefore,numerous scholars have made efforts to learn features directly from unlabeled samples.For instance,the theory of autoencoder was applied to multimodal joint classification,achieving satisfactory performance.Methods based on the reconstruction idea reduce the dependence on labeled information to a certain extent,but several problems that must be settled still exist.For example,these methods pay more attention to data reconstruction but fail to guarantee that the extracted features have sufficient discriminant capability,thus affecting the performance of joint classification.This paper proposes an effective model named Joint Classification of Hyperspectral and LiDAR Data Based on Inter-Modality Match Learning to address the aforementioned issue.Different from feature extraction models based on reconstruction idea,the proposed model tends to compare the matching relationship between samples from different modalities,thereby enhancing the discriminative capability of features.Specifically,this model comprises inter-modality matching learning and multimodal joint classification networks.The former is prone to identify matching of the input patch pairs of hyperspectral image and LiDAR data;therefore,reasonable construction of matching labels is essential.Thus,spatial positions of center pixels in cropped patches and KMeans clustering methods are employed.These constructed labels and patch pairs are combined to train the network.Notably,this process does not use manual labeled information and can directly extract features from abundant unlabeled samples.Furthermore,in the joint classification stage,the structure and trained parameters of matching learning network are transferred,and a small number of manually labeled training samples are then used to finetune the model parameters.Extensive experiments were conducted on two widely used datasets,namely Houston and MUUFL,to verify the effectiveness of the proposed model.These experiments include comparison experiments with several state-of-the-art models,hyperparameter analysis experiments,and ablation studies.Take the first experiment as an example.Compared with other models,such as CNN,EMFNet,AE_H,AEHL,CAE H,CAE HL,IP-CNN,and PToP CNN,the proposed model can achieve higher performance on both datasets with OAs of 88.39%and 81.46%,respectively.Overall,the proposed model reduces the dependence on manually labeled data and improves the joint classification accuracy in the case of limited training samples.A superior model structure and additional testing datasets will be explored in the future to make further improvements.