Frequency Decomposition and Double-Branch Feature Extraction for Multispectral-Image-Compression Network
Objective Multispectral images,as captured using aerospace optical instruments,feature high spectral resolutions and abundant information.Hence,they are used extensively in the military,meteorology,mapping,and other fields.However,their large memory size poses significant transmission and storage challenges to remote-sensing satellites and end users.Image compression can solve this issue.The classical image encoding employs transform coding techniques to decompose the original image into coefficients concentrated in energy,which are then quantized to achieve efficient compression;however,this results in significant blocking artifacts.Currently,the performance of image compression based on the classical encoding is inferior to that of image compression based on deep-learning networks.In particular,end-to-end deep-learning models demonstrate excellent performance in terms of image compression.Nevertheless,most learning-based compression frameworks are designed for visible light and focus primarily on spatial redundancy,which results in suboptimal compression performance for multispectral images.Therefore,this study proposes a learning-based multispectral-image-compression network to address these challenges.Methods The network adopts a variational autoencoder architecture that incorporates rate-distortion optimization and a hyper-prior entropy model.Specifically,owing to the varying sensitivity of the human eye to information at different frequencies,the network initially employs a pooling convolutional network to decompose the input image into high-and low-frequency components.Subsequently,these components are input to separate feature-extraction networks for high and low frequencies.Feature extraction networks were constructed using the SSFE(spacial and spectral feature extraction),attention,and activation-function modules.Dense connections between layers are utilized to extract multiscale and contextual information on latent features across different frequencies.The extracted potential features are quantified and compressed into a bitstream using an arithmetic encoder.Simultaneously,the potential features are input to the prediction network of the hyper-prior entropy model to extract edge information and generate a probability-distribution model to facilitate decoding.The structure of the decoding end is symmetrical to that of the encoding end,and the feature components are restored to their original frequency components using the opposite operation.Finally,the dual-attention module integrates the high-and low-frequency components to generate reconstructed images,thus completing the compression process.Results and Discussions To verify the compression performance of the proposed compression method on multispectral images,we selected 8 and 12 band multispectral images for experiments,and the experimental datasets are both open-source datasets.The proposed method was compared with two classical encoded-image compression algorithms(JPEG2000 and 3D-SPHIT),a video-compression coding method(H.266/VCC),and two learning-based image-compression algorithms(Joint and DBSSFE-Net)using three evaluation indices:PSNR(peak signal-to-noise ration),MS-SSIM(multi-scale structral similarity index measurement),and MSA(mean spectral angle).The experimental results show that the proposed FDDBFE-Net yields higher PSNR values compared with various classical algorithms,with average improvements of 0.89 dB,1.14 dB,and 1.87 dB compared with the DBSSFE,Joint,and VCC algorithms,respectively.Performance evaluation based on the MS-SSIM index shows that the proposed compression model is the most similar to the original image in terms of structural similarity,with improvements of 1.56 dB,0.96 dB,and 2.95 dB compared with the DBSSFE,Joint,and VCC algorithms,respectively.Furthermore,the spectral-reconstruction quality shows that the proposed method provides the minimum spectral angle.This indicates that the reconstructed image has the smallest spectral loss and is the most similar to the original image in terms of quality.The proposed method exhibits lower network spectral losses by 13.1%,9.5%,and 20.2%compared with the DBSSFE,Joint,and VCC algorithms,respectively.When compared with the results of the 12-band images,the disadvantages of the classical methods are particularly evident.Compared with DBSSFE-Net,the proposed algorithm yields a higher PSNR by 2.5 dB,a higher MS-SSIM by2.2dB,and a lower MSA by 30.6%.Compared with the Joint algorithm,it yields a higher PSNR by 0.9 dB,a higher MS-SSIM by 0.4 dB,and a lower MSA by 5.29%.Compared with the VCC algorithm,it yields a higher PSNR by 3.4 dB,a higher MS-SSIM by 3.9 dB,and a lower MSA by 34.9%.Additionally,the proposed algorithm demonstrates the optimal encoding and decoding time on a graphics processing unit(GPU),whereas its decoding time on a central processing unit(CPU)is longer,which is attributable to the frequency decomposition and synthesis modules added.In general,the proposed algorithm performs better than the other algorithms investigated in terms of compression performance.Conclusions In this study,a multispectral-image-compression network based on a variational autoencoder was proposed.The network has an end-to-end symmetrical structure and embeds various key technologies.Considering the spatial multiscale and spectral nonstationarity of multispectral images,a double-branch frequency decomposition feature extraction method was proposed,which can effectively extract the spatial and interspectral features of the images,enhance the attention to different channels,and improve the robustness of the model.Experimental results show that the proposed model achieves excellent performance on multispectral image datasets,which surpasses those of the conventional JPEG2000,3D-SPHIT,and H.266/VCC compression methods.Furthermore,it performs better than the DBSSFE-Net and Joint algorithms,which are based on a variational autoencoder structure.