遥感信息2024,Vol.39Issue(3) :121-127.DOI:10.20091/j.cnki.1000-3177.2024.03.017

基于CNN与ViT混合结构的遥感图像地物分类算法

A Hybrid Algorithm for Remote Sensing Image Land Cover Classification Combining CNN and ViT

陈佳慧 路鹏 罗小玲 郜晓晶 潘新
遥感信息2024,Vol.39Issue(3) :121-127.DOI:10.20091/j.cnki.1000-3177.2024.03.017

基于CNN与ViT混合结构的遥感图像地物分类算法

A Hybrid Algorithm for Remote Sensing Image Land Cover Classification Combining CNN and ViT

陈佳慧 1路鹏 1罗小玲 1郜晓晶 1潘新1
扫码查看

作者信息

  • 1. 内蒙古农业大学计算机与信息工程学院,呼和浩特 010018
  • 折叠

摘要

针对传统的基于机器学习和卷积神经网络等遥感图像分类方法整体分类精度不高以及受限于局部感受野造成的全局特征提取不足等现象,为进一步提高遥感图像的分类精度,提出了一种结合三维、二维卷积核混合的神经网络(three dimensional and two dimensional convolutional neural network,3D-2D CNN)与视觉transformer(vision transformer,ViT)的遥感图像分类方法Hybrid CNN-ViT.算法在3D和2D卷积核充分提取遥感图像空间光谱信息的基础上,通过ViT的多头注意力机制提取全局序列信息,解决全局特征提取不足的问题.实验将影像划分不同比例的训练集、验证集与测试集,并与DBDA、DBMA和3D-2D CNN做对比.结果表明,训练集:验证集:测试集为8:1:1时,该方法的分类精度达到最高,总体分类精度(99.47%)、Kappa系数(0.9908)均优于其他3种方法.

Abstract

Aiming at the phenomena that the overall classification accuracy of traditional remote sensing image classification methods based on machine learning and convolutional neural network is not high and the global feature extraction is insufficient due to the restriction of the local receptive field,this paper,in order to further improve the classification accuracy of remote sensing images,proposes a combination of a 3D-2D CNN(three dimensional and two dimensional convolutional neural network)with a visual transformer(vision transformer,ViT)named Hybrid CNN-ViT.The algorithm solves the problem of insufficient global feature extraction by extracting global sequence information through the multi-attention mechanism of ViT on the basis of 3D and 2D convolutional kernels to fully extract spatial spectral information of remote sensing images.The experiment uses images to divide the training set,validation set and test set with different proportions,and makes comparisons with DBDA,DBMA and 3D-2D CNN.The results show that the proposed method achieves the highest classification accuracy when the training set∶validation set∶test set is 8∶1∶1,and the overall classification accuracy(99.47%)and Kappa coefficient(0.990 8)are better than that of the other three methods.

关键词

卷积神经网络/深度学习/视觉transformer/地物分类/图像处理

Key words

CNN/deep learning/vision transformer/classification/image processing

引用本文复制引用

基金项目

国家自然科学基金(61962048)

国家自然科学基金(61562067)

出版年

2024
遥感信息
科学技术部国家遥感中心,中国测绘科学研究院

遥感信息

CSTPCDCSCD北大核心
影响因子:0.712
ISSN:1000-3177
被引量1
参考文献量9
段落导航相关论文