深度嵌套式Transformer网络的高光谱图像空谱解混方法

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：目的基于深度学习的解混方法在信息挖掘和泛化性能上优于传统方法,但主要关注光谱信息,对空间信息的利用仍停留在滤波、卷积的表层处理.这使得构建解混网络时需要堆叠多层网络,易丢失部分图像信息,影响解混准确性.Transformer网络因其强大的特征表达能力广泛应用于高光谱图像处理,但将其直接应用于解混学习容易丢失图像局部细节.本文基于Transformer网络提出了改进方法.方法本文以TNT(Transformer in Transformer)构架为基础提出了一种深度嵌套式解混网络(deep embedded Transformer network,DETN),通过内外嵌入式策略实现编码器中局部与整体空间信息共享,不仅保留了高光谱图像的空间细节,而且在编码器中只涉及少量卷积运算,大幅度提升了学习效率.在解码器中,通过一次卷积运算来恢复数据结构以便生成端元与丰度,并在最后使用Softmax层来保障丰度的物理意义.结果最后,本文分别采用模拟数据集和真实高光谱数据集进行对比实验,在50 dB模拟数据集中平均光谱角距离和均方根误差取得最优值,分别为0.038 6和0.004 5,在真实高光谱数据集Samson、Jasper Ridge中取得最优平均光谱角距离,分别为0.119 4,0.102 7.结论实验结果验证了 DETN方法的有效性和优势,并且能为实现深度解混提供新的技术支撑和理论参考.

外文标题：Deep embedded Transformer network with spatial-spectral information for unmixing of hyperspectral remote sensing images

外文摘要：Objective In hyperspectral remote sensing,mixed pixels often exist due to the complex surface of natural objects and the limitation of spatial resolution of instruments.Mixed pixels typically refer to the situation where a pixel in the hyperspectral images usually contains multiple spectral features,which hinders the application of hyperspectral images in various fields such as target detection,image classification,and environmental monitoring.Therefore,the decomposi-tion(unmixing)of mixed pixels is a main concern in the processing of hyperspectral remote sensing images.Spectral unmixing aims to overcome the limitations of image spatial resolution by extracting pure spectral signals(endmembers)rep-resenting each land cover class and their respective proportions(abundances)within each pixel.It is based on a spectral mixing model at the sub-pixel level.The rise of deep learning has brought many advanced modeling theories and architec-ture tools to the field of hyperspectral mixed pixel decomposition and has also spawned many deep learning-based unmixing methods.Although these methods have advantages over traditional methods in terms of information mining and generaliza-tion performance,deep networks often need to combine multiple layers of stacked network layers to achieve optimal learn-ing outcomes.Therefore,deep networks may cause damage to the internal structure of the data during the training process,which leads to the loss of important information in hyperspectral data and affects the accuracy of unmixing.In addition,most existing deep learning-based unmixing methods focus only on spectral information,but the exploit of spatial informa-tion is still limited to surface processing stages such as filtering and convolution.In recent years,autoencoder has been one of the research hotspots in the field of deep learning,and many variant networks based on autoencoder networks have emerged.Transformer is a novel deep learning network with an autoencoder-like structure.It has garnered considerable attention in various fields such as natural language processing,computer vision,and time series analysis due to its powerful feature representation capability.The Transformer,as a neural network primarily based on the self-attention mechanism,can better explore the underlying relationships among different features and more comprehensively aggregate the spectral and spatial correlations of pixels.This way enhances the ability of abundance learning and improves the accuracy of unmix-ing.Although the Transformer network has recently been used to design unmixing methods,using unsupervised Trans-former models directly to obtain features can lose many local details and cause difficulty in exploiting the long-range depen-dency properties of Transformers effectively.Method To address these limitations,the study proposes a deep embedded Transformer network(DETN)based on the Transformer-in-Transformer architecture.This network adopts an autoencoder framework that consists of two main parts:node embedding(NE)and blind signal separation.In the first part,the input hyperspectral image is first uniformly divided twice,and the divided image patches are mapped into sub-patch sequences and patch sequences through linear transformation operations.Then,the sub-patch sequences are processed through an internal Transformer structure to obtain pixel spectral information and local spatial correlations,which are then aggregated into the patch sequences for parameter and information sharing.Finally,the local detail information in the patch sequences is retained,and the patch sequences are processed through an external Transformer structure to obtain and output pixel spectral information and global spatial correlation information containing local information.In the second part,the input NE is first reconstructed into an abundance map and smoothed during this process using a single layer of 2D convolution layer to eliminate noise.A SoftMax layer is used to ensure the physical meaning of the abundances.Finally,a single-layer 2D convolution layer is used to reconstruct the image,which optimizes and estimates the endmembers in the convolution layer.Result To evaluate the effectiveness of the proposed method,experiments are conducted using simulated datasets and some real hyperspectral datasets,including the Samson dataset,the Jasper Ridge dataset,and a part of the real hyper-spectral farmland data in Nanchang City,Jiangxi Province,obtained by the Gaofen-5 satellite provided by Beijing Shengshi Huayao Technology Co.,Ltd.In addition,resources from the ZY1E satellite provided by Beijing Shengshi Huayao Tech-nology Co.,Ltd.are used to obtain partial hyperspectral data of the Marseille Port in France for comparative experiments with different methods.The experimental results are quantitatively analyzed using spectral angle distance(SAD)and root mean square error(RMSE).In addition,the method evaluates the proposed DETN compared with several state-of-the-art deep learning-based unmixing algorithms:fully strained least squares(FCLS),deep autoencoder networks for hyperspec-tral unmixing(DAEN),autoencoder network for hyperspectral unmixing with adaptive abundance smoothing(AAS),the untied denoising autoencoder with sparsity(uDAS),hyperspectral unmixing using deep imageprior(UnDIP),and hyper-spectral unmixing using Transformer network(DeepTrans-HSU).Results demonstrate that the proposed method outper-forms the compared methods in terms of spectral angle distance(SAD),root mean square error(RMSE),and other evalua-tion metrics.Conclusion The proposed method effectively captures and preserves the spectral information of pixels at local and global levels,as well as the spatial correlations among pixels.This method results in accurate extraction of endmembers that match the ground truth spectral features.Moreover,the method produces smooth abundance maps with high spatial consistency,even in regions with hidden details in the image.These findings validate that the DETN method provides new technical support and theoretical references for addressing the challenges posed by mixed pixels in hyperspectral image unmixing.

外文关键词：

remote sensing image processinghyperspectral remote sensinghyperspectral unmixingdeep learningTransformer network

作者：

游雪儿、苏远超、蒋梦莹、李朋飞、刘东升、白晋颖

展开 >

作者单位：

西安科技大学测绘科学与技术学院,西安 710054

西安交通大学电子与信息学部,西安 710049

航天宏图信息技术股份有限公司,北京 100195

关键词：

遥感图像处理高光谱遥感混合像元分解深度学习 Transformer网络

基金：

国家自然科学基金教育部产学研协同育人项目陕西省教育厅科研项目

项目编号：

4200131922080231320085921JK0762

出版年：

2024

DOI：

10.11834/jig.230393

中国图象图形学报

中国科学院遥感应用研究所,中国图象图形学学会 ,北京应用物理与计算数学研究所

中国图象图形学报

CSTPCD北大核心

影响因子：1.111

ISSN：1006-8961

年,卷(期)：2024.29(8)