Viewport-independent and deformation-unaware no-reference omnidirectional image quality assessment
Objective With the rapid development of the virtual reality(VR)industry,the omnidirectional image acts as an important medium of visual representation of VR and may degrade in the procedure of acquisition,transmission,pro-cessing,and storage.Omnidirectional image quality assessment(OIQA)is an evaluation technique that aims to quantita-tively describe the degradation of omnidirectional images and plays a crucial role in algorithm improvement and system opti-mization.Generally,the omnidirectional image has some inherent characteristics,i.e.,geometric deformation in the polar region and semantic information more concentrated on the equatorial region.The viewing behavior can conspicuously affect the perceptual quality of an omnidirectional image.Early OIQA methods that simply fuse this inherent characteristic in 2D-IQA do not consider the significant user viewing behavior,thus obtaining suboptimal performance.Considering the view-port representation that is in line with the user viewing behavior,some deep learning-based OIQA methods have recently achieved promising performance by taking the predicted viewport sequence as the model input and computing the degrada-tion.However,the prediction of the viewport sequence is difficult and viewport extraction needs a series of pixel-wise com-putations,thus leading to a significant computation load and hampering the application in the industry environment.To address the above problems,we proposed a new no-reference OIQA model,which introduces an equirectangular modulated deformable convolution(EquiMdconv)that can deal with the irregular semantics and the regular deformation caused by equirectangular projection simultaneously without the predicted viewport sequence.Method We propose a viewport-independent and deformation-unaware no-reference OIQA model for omnidirectional image quality assessment.Our model is composed of three parts:a prior-guided patch sampling(PPS)module,a deformable-unaware feature extraction(DUFE)module,and an intra-interpatch attention aggregation(A-EPAA)module.The PPS module samples a set of patch images on the basis of prior probability distribution in a slice-based manner to represent the complete image quality informa-tion.DUFE aims to extract the perceptual quality features of the input patch images,considering the irregular semantics and regular deformation in this process.It contains eight blocks,and each block comprises an EquiMconv layer,a 1 × 1 convolutional layer,a batch normalization layer,and a 3 x 3 max pooling layer.The EquiMconv layer employs a modu-lated deformable convolution layer that introduces learnable offset parameters to model distortions in the images more accu-rately.Furthermore,we incorporate fixed offsets based on distortion regularity factors into the deformable convolution's off-set to effectively eliminate the regular deformation.The A-EPAA comprises a convolutional block attention module(CBAM)and a patch attention module(PA).The CBAM assigns weights to each channel to adjust perceptual quality fea-tures in both channel and spatial dimensions.The PA adjusts the contribution weights between patch images for an overall quality assessment.We train the proposed model on the CVIQ,OIQA,and JUFE databases.In the training stage,we split each database into two parts:80%for training and 20%for testing.We sample 10 patch images from each omnidirectional image,and the size of the patch image is set to 224 × 224.All experiments are implemented on a server with an NVIDIA GTX A5000 GPU.Adaptive moment estimation optimizer(Adam)is utilized to optimize our model.We train the model for 300 epochs on the CVIQ and OIQA databases and 20 epochs on the JUFE database;the learning rate is 0.000 1 and the batch size is 16.Result We conduct experiments covering three databases,namely,CVIQ,OIQA,and JUFE.We demon-strate the performance of the proposed model by comparing it with nine viewport-independent models and five viewport-dependent models.To ensure a persuasive comparison result,we select the Pearson linear correlation coefficient and Spear-man's rank correlation coefficient(SRCC)as performance evaluation standards.The results indicate that compared with those of the state-of-the-art viewport-dependent model,i.e.,Assessor360,the parameters of our model are reduced by 93.7%and the floating point operations are reduced by 95.4%.Compared with the MC360IQA,which has a similar model size,the SRCC is increased by 1.9%,1.7%,and 4.3%on the CVIQ,OIQA,and JUFE databases,respectively.Con-clusion Our proposed viewport-independent and deformation-unaware no-reference OIQA model thoroughly considers the characteristics of the omnidirectional image.It can effectively extract quality features and accurately assess the quality of omnidirectional images with limited computational cost.