A Hybrid Algorithm for Remote Sensing Image Land Cover Classification Combining CNN and ViT
陈佳慧 1路鹏 1罗小玲 1郜晓晶 1潘新1
扫码查看
点击上方二维码区域,可以放大扫码查看
作者信息
1. 内蒙古农业大学计算机与信息工程学院,呼和浩特 010018
折叠
摘要
针对传统的基于机器学习和卷积神经网络等遥感图像分类方法整体分类精度不高以及受限于局部感受野造成的全局特征提取不足等现象,为进一步提高遥感图像的分类精度,提出了一种结合三维、二维卷积核混合的神经网络(three dimensional and two dimensional convolutional neural network,3D-2D CNN)与视觉transformer(vision transformer,ViT)的遥感图像分类方法Hybrid CNN-ViT.算法在3D和2D卷积核充分提取遥感图像空间光谱信息的基础上,通过ViT的多头注意力机制提取全局序列信息,解决全局特征提取不足的问题.实验将影像划分不同比例的训练集、验证集与测试集,并与DBDA、DBMA和3D-2D CNN做对比.结果表明,训练集:验证集:测试集为8:1:1时,该方法的分类精度达到最高,总体分类精度(99.47%)、Kappa系数(0.9908)均优于其他3种方法.
Abstract
Aiming at the phenomena that the overall classification accuracy of traditional remote sensing image classification methods based on machine learning and convolutional neural network is not high and the global feature extraction is insufficient due to the restriction of the local receptive field,this paper,in order to further improve the classification accuracy of remote sensing images,proposes a combination of a 3D-2D CNN(three dimensional and two dimensional convolutional neural network)with a visual transformer(vision transformer,ViT)named Hybrid CNN-ViT.The algorithm solves the problem of insufficient global feature extraction by extracting global sequence information through the multi-attention mechanism of ViT on the basis of 3D and 2D convolutional kernels to fully extract spatial spectral information of remote sensing images.The experiment uses images to divide the training set,validation set and test set with different proportions,and makes comparisons with DBDA,DBMA and 3D-2D CNN.The results show that the proposed method achieves the highest classification accuracy when the training set∶validation set∶test set is 8∶1∶1,and the overall classification accuracy(99.47%)and Kappa coefficient(0.990 8)are better than that of the other three methods.