Collaborative classification of hyperspectral and LiDAR data based on CNN-transformer
To tackle the challenges in multimodal classification tasks involving hyperspectral images(HSI)and LiDAR data,such as cross-modal information expression and feature alignment,this paper introduces a contrastive learning-based multi-branch CNN-Transformer network(CLCT-Net)for the joint classifica-tion of hyperspectral and LiDAR data.Initially,CLCT-Net employs a feature extraction module with a ConvNeXt V2 Block to capture shared features across different modalities,addressing the semantic align-ment issue between data from heterogeneous sensors.It then develops a dual-branch HSI encoder with spa-tial channel and spectral context branches,alongside a LiDAR encoder enhanced by a frequency domain self-attention mechanism,to secure more comprehensive feature representations.Lastly,it leverages en-semble contrastive learning for classification to further refine the accuracy of multimodal collaborative clas-sification.Experimental evaluations on the Houston 2013 and Trento datasets demonstrate that the pro-posed model excels in extracting and integrating cross-modal data features,achieving superior ground ob-ject classification accuracies of 92.01%and 98.90%,respectively,when compared to existing models for classifying hyperspectral images and LiDAR data.