频率分解双支特征提取的多光谱图像压缩网络

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：多光谱图像压缩的核心挑战在于去除空间和光谱冗余.近期端到端的深度学习模型在这一任务上显示出良好的压缩性能.由于人眼对不同频率信息的敏感程度不同,且不同频率信息特征在压缩时的重要性也不同,多数基于学习的方法会忽略图像本身的信息,直接从特征层面训练学习.因此提出一种基于频率分解双支特征提取多光谱图像压缩网络.在编码端,通过卷积网络将多光谱图像分解为高低频数据分别输入压缩网络.在网络中采用特征提取模块提取空间信息和谱间信息的特征表示.然后利用嵌入超先验熵编码模型的变分自编码器将空谱特征压缩成码流.解码端与编码端是对称结构,采用与编码端相反的操作生成重构图像.实验结果表明,此方法与现有压缩算法在多个评估指标下比较,具有更优的压缩性能.

外文标题：Frequency Decomposition and Double-Branch Feature Extraction for Multispectral-Image-Compression Network

外文摘要：Objective Multispectral images,as captured using aerospace optical instruments,feature high spectral resolutions and abundant information.Hence,they are used extensively in the military,meteorology,mapping,and other fields.However,their large memory size poses significant transmission and storage challenges to remote-sensing satellites and end users.Image compression can solve this issue.The classical image encoding employs transform coding techniques to decompose the original image into coefficients concentrated in energy,which are then quantized to achieve efficient compression;however,this results in significant blocking artifacts.Currently,the performance of image compression based on the classical encoding is inferior to that of image compression based on deep-learning networks.In particular,end-to-end deep-learning models demonstrate excellent performance in terms of image compression.Nevertheless,most learning-based compression frameworks are designed for visible light and focus primarily on spatial redundancy,which results in suboptimal compression performance for multispectral images.Therefore,this study proposes a learning-based multispectral-image-compression network to address these challenges.Methods The network adopts a variational autoencoder architecture that incorporates rate-distortion optimization and a hyper-prior entropy model.Specifically,owing to the varying sensitivity of the human eye to information at different frequencies,the network initially employs a pooling convolutional network to decompose the input image into high-and low-frequency components.Subsequently,these components are input to separate feature-extraction networks for high and low frequencies.Feature extraction networks were constructed using the SSFE(spacial and spectral feature extraction),attention,and activation-function modules.Dense connections between layers are utilized to extract multiscale and contextual information on latent features across different frequencies.The extracted potential features are quantified and compressed into a bitstream using an arithmetic encoder.Simultaneously,the potential features are input to the prediction network of the hyper-prior entropy model to extract edge information and generate a probability-distribution model to facilitate decoding.The structure of the decoding end is symmetrical to that of the encoding end,and the feature components are restored to their original frequency components using the opposite operation.Finally,the dual-attention module integrates the high-and low-frequency components to generate reconstructed images,thus completing the compression process.Results and Discussions To verify the compression performance of the proposed compression method on multispectral images,we selected 8 and 12 band multispectral images for experiments,and the experimental datasets are both open-source datasets.The proposed method was compared with two classical encoded-image compression algorithms(JPEG2000 and 3D-SPHIT),a video-compression coding method(H.266/VCC),and two learning-based image-compression algorithms(Joint and DBSSFE-Net)using three evaluation indices:PSNR(peak signal-to-noise ration),MS-SSIM(multi-scale structral similarity index measurement),and MSA(mean spectral angle).The experimental results show that the proposed FDDBFE-Net yields higher PSNR values compared with various classical algorithms,with average improvements of 0.89 dB,1.14 dB,and 1.87 dB compared with the DBSSFE,Joint,and VCC algorithms,respectively.Performance evaluation based on the MS-SSIM index shows that the proposed compression model is the most similar to the original image in terms of structural similarity,with improvements of 1.56 dB,0.96 dB,and 2.95 dB compared with the DBSSFE,Joint,and VCC algorithms,respectively.Furthermore,the spectral-reconstruction quality shows that the proposed method provides the minimum spectral angle.This indicates that the reconstructed image has the smallest spectral loss and is the most similar to the original image in terms of quality.The proposed method exhibits lower network spectral losses by 13.1％,9.5％,and 20.2％compared with the DBSSFE,Joint,and VCC algorithms,respectively.When compared with the results of the 12-band images,the disadvantages of the classical methods are particularly evident.Compared with DBSSFE-Net,the proposed algorithm yields a higher PSNR by 2.5 dB,a higher MS-SSIM by2.2dB,and a lower MSA by 30.6％.Compared with the Joint algorithm,it yields a higher PSNR by 0.9 dB,a higher MS-SSIM by 0.4 dB,and a lower MSA by 5.29％.Compared with the VCC algorithm,it yields a higher PSNR by 3.4 dB,a higher MS-SSIM by 3.9 dB,and a lower MSA by 34.9％.Additionally,the proposed algorithm demonstrates the optimal encoding and decoding time on a graphics processing unit(GPU),whereas its decoding time on a central processing unit(CPU)is longer,which is attributable to the frequency decomposition and synthesis modules added.In general,the proposed algorithm performs better than the other algorithms investigated in terms of compression performance.Conclusions In this study,a multispectral-image-compression network based on a variational autoencoder was proposed.The network has an end-to-end symmetrical structure and embeds various key technologies.Considering the spatial multiscale and spectral nonstationarity of multispectral images,a double-branch frequency decomposition feature extraction method was proposed,which can effectively extract the spatial and interspectral features of the images,enhance the attention to different channels,and improve the robustness of the model.Experimental results show that the proposed model achieves excellent performance on multispectral image datasets,which surpasses those of the conventional JPEG2000,3D-SPHIT,and H.266/VCC compression methods.Furthermore,it performs better than the DBSSFE-Net and Joint algorithms,which are based on a variational autoencoder structure.

外文关键词：

multispectral image compressionvariational autoencoderfrequency decompositionspatial and spectral feature extractionconvolutional neural network

作者：

徐德枭、孔繁锵、王坤、方煦、黄木容

展开 >

作者单位：

南京航空航天大学航天学院,江苏南京 210016

空军装备部驻无锡地区第一军事代表室,江苏无锡 214000

中国航空工业集团公司雷华电子技术研究所,江苏无锡 214000

关键词：

多光谱图像压缩变分自编码器频率分解空谱特征提取卷积神经网络

出版年：

2024

DOI：

10.3788/CJL240727

中国激光

中国光学学会　中科院上海光机所

中国激光

CSTPCD北大核心

影响因子：2.204

ISSN：0258-7025

年,卷(期)：2024.51(21)