三维场景注视点渲染深度学习方法综述

扫码查看

原文链接

万方数据
维普

中文摘要：在大型高分辨显示器和头戴式显式设备中实现实时、逼真的渲染仍然是计算机图形学面临的主要挑战之一.注视点渲染(foveated rendering)利用人类视觉系统的局限性,根据注视点调整图像渲染质量,从而在不损失用户感知质量的前提下大大提高渲染速度.随着深度学习方法在渲染领域的广泛应用,涌现出大量基于深度学习的注视点渲染新方法.本文从深度学习的角度对注视点渲染领域的最新方法进行综述.首先,概述了人类视觉感知的背景知识.接着,简要介绍了注视点渲染中最具代表性的非深度学习方法,包括自适应分辨率、几何简化、着色简化和硬件实现,并总结了这些方法的优缺点.随后,描述了文中用于评估深度学习不同方法所使用的评估准则,包括常用的注视点渲染图像的评估指标和注视点预测评估指标.接下来,将注视点渲染中的深度学习方法细分为超分辨率、降噪、补全、图像合成、注视点预测和图像应用,对它们进行详细概述和总结.最后,提出了深度学习方法目前面临的问题和挑战.通过对注视点渲染领域的深度学习方法的讨论,可以更详细地展示深度学习在注视点渲染中的研究前景和发展方向,对后续研究人员在选择研究方向和设计网络架构等方面都有一定的参考价值.

外文标题：Deep learning-based foveated rendering in 3D space:a review

外文摘要：The widespread adoption of virtual reality(VR)and augmented reality technologies across various sectors,including healthcare,education,military,and entertainment,has propelled head-mounted displays with high resolution and wide fields of view into the forefront of display devices.However,attaining a satisfactory level of immersion and inter-activity poses a primary challenge in the realm of VR,with latency potentially leading to user discomfort in the form of dizzi-ness and nausea.Multiple studies have underscored the necessity of achieving a highly realistic VR experience while main-taining user comfort,entailing the elevation of the screen's image refresh rate to 1 800 Hz and keeping latency below 3～40 ms.Achieving real-time,photorealistic rendering at high resolution and low latency represents a formidable objective.Foveated rendering is an effective approach to address these issues by adjusting the rendering quality across the image based on gaze position,maintaining high quality in the fovea area while reducing quality in the periphery.This technique leads to substantial computational savings and improved rendering speed without a perceptible loss in visual quality.While previous reviews have examined technical approaches to foveated rendering,they focused more on categorizing the imple-mentation techniques.A comprehensive review within the domain of machine learning still needs to be explored.With the ongoing advancements in machine learning within the rendering field,combining machine learning and foveated rendering is considered a promising research area,especially in postprocessing,where machine learning methods have great poten-tial.Nonmachine learning methods inevitably introduce artifacts.By contrast,machine learning methods have a wide range of applications in the postprocessing domain of rendering to optimize and improve foveated rendering results and enhance the realism and immersion of foveated images in a manner unattainable through nonmachine learning approaches.Therefore,this work presents a comprehensive overview of foveated rendering from a machine-learning perspective.In this paper,we first provide an overview of the background knowledge of human visual perception,including aspects of the human visual system,contrast sensitivity functions,visual acuity models,and visual crowding.Subsequently,this paper briefly describes the most representative nonmachine learning methods for point-of-attention rendering,including adaptive resolution,geometric simplification,shading simplification,and hardware implementation,and summarizes these meth-ods'features,advantages,and disadvantages.Additionally,we describe the criteria employed for method evaluation in this review,including evaluation metrics for foveated images and gaze-point prediction.Next,we subdivide machine learn-ing methods into super-resolution,denoise,image reconstruction,image synthesis,gaze prediction,and image applica-tion.We provide a detailed summary of them in terms of four aspects:results quality,network speed,user experience,and the ability to handle objects.Among them,super-resolution methods commonly use more neural blocks in the foveal region while fewer neural blocks in the periphery region,resulting in variable regional super-resolution quality.Similarly,foveated denoising usually performs fine denoising in the fovea and coarse denoising in the peripheral,but the denoising aspect has yet to receive extensive attention.The initial attempt to integrate image reconstruction with gaze utilized genera-tive adversarial networks(GANs),yielding promising outcomes.Then,some researchers combined direct prediction and kernel prediction for image reconstruction,which is also the state of the art in this field.Gaze prediction is a key develop-ment direction for future VR rendering,which is mostly combined with saliency detection to predict the location of the view-point.Substantial work remains in the field,but unfortunately,only a tiny portion of the work can be achieved in real time.Finally,we present the current problems and challenges machine learning methods face.Our review of machine learning approaches in foveated rendering not only elucidates the research prospects and developmental direction but also provides insights for future researchers in choosing research direction and designing network architectures.

外文关键词：

foveated renderingdeep learningreal-time renderingeye fixations predictionimage reconstructionsuper-resolutionray tracing denoising

作者：

李英群、胡啸、徐翔、徐延宁、王璐

展开 >

作者单位：

山东大学软件学院,济南 250101

山东财经大学山东省区块链金融重点实验室,济南 250014

关键词：

注视点渲染深度学习实时渲染注视点预测图像补全超分辨率光路追踪降噪

基金：

国家重点研发计划资助国家自然科学基金项目

项目编号：

2022YFB330320362272275

出版年：

2024

DOI：

10.11834/jig.230708

中国图象图形学报

中国科学院遥感应用研究所,中国图象图形学学会 ,北京应用物理与计算数学研究所

中国图象图形学报

CSTPCD北大核心

影响因子：1.111

ISSN：1006-8961

年,卷(期)：2024.29(10)