Visual perception and understanding in degraded scenarios
Visual media such as images and videos are crucial means for humans to acquire,express,and convey informa-tion.The widespread application of foundational technologies like artificial intelligence and big data has facilitated the gradual integration of systems for the perception and understanding of images and videos into all aspects of production and daily life.However,the emergence of massive applications also brings challenges.Specifically,in open environments,various applications generate vast amounts of heterogeneous data,which leads to complex visual degradation in images and videos.For instance,adverse weather conditions like heavy fog can reduce visibility,which results in the loss of details.Data captured in rainy or snowy weather can exhibit deformations in objects or individuals due to raindrops,which result in structured noise.Low-light conditions can cause severe loss of details and structured information in images.Visual degra-dation not only diminishes the visual presentation and perceptual experience of images and videos but also significantly affects the usability and effectiveness of existing visual analysis and understanding systems.In today's era of intelligence and information technology,with explosive growth in visual media data,especially in challenging scenarios,visual percep-tion and understanding technologies hold significant scientific significance and practical value.Traditional visual enhance-ment techniques can be divided into two methods:spatial domain-based and frequency domain-based.Spatial domain methods directly process 2D spatial data,including grayscale transformation,histogram transformation,and spatial domain filtering.Frequency domain methods transform data into the frequency domain through models,like Fourier transform,for processing and then restore it to the spatial domain.The development of computer vision technology has facilitated the emergence of more well-designed and robust visual enhancement algorithms,such as dehazing algorithms based on dark channel priors.Since 2010s,the rapid advancement in artificial intelligence technology has enabled the development of many visual enhancement methods based on deep learning models.These methods not only can reconstruct damaged visual information but also can further improve the visual presentation,which comprehensively enhances the visual perceptual experience of images and videos captured in challenging scenarios.As computer vision technology becomes more wide-spread,intelligent visual analysis and understanding are penetrating various aspects of society,such as face recognition and autonomous driving.However,visual enhancement in traditional digital image processing frameworks mainly focuses on improving visual effects,which ignores the impact on high-level analysis tasks.This oversight severely reduces the usability and effectiveness of existing visual understanding systems.In recent years,several visual understanding datasets for challenging scenarios have been established,which leads to the development of numerous visual analysis and under-standing algorithms for these scenarios.Domain transfer methods from ordinary scenes to challenging scenes are gaining attention in further reducing reliance on datasets.Coordinating and optimizing the relationship between visual perception and visual presentation,which are two different task objectives,are also important research problems in the field of visual computing.To address the development needs of the visual computing field in challenging scenarios,this study extensively reviews the challenges of the aforementioned research,outlines the developmental trends,and explores the cutting-edge dynamics.Specifically,this study reviews the technologies related to visual degradation modeling,visual enhancement,and visual analysis and understanding in challenging scenarios.In the section on visual data and degradation modeling,various methods for modeling image and video degradation processes in different degradation scenarios are discussed.These methods include noise modeling,downsampling modeling,illumination modeling,and rain and fog modeling.For noise modeling,Poissonian-Gaussian noise modeling is the most commonly used.For downsampling modeling,classical methods include bicubic interpolation and blurring kernel.Noise including JPEG compression is also considered.A recent comprehensive model jointly uses blurring,downsampling,and noise.For illumination modeling,the Retinex theory is one of the most widely used.It decomposes images into illumination and reflectance.For rain and fog modeling,images are generally decomposed into rain and background layers.In the traditional visual enhancement section,numerous visual enhancement algorithms have been developed to address the degradation of image and video information in adverse sce-narios.Early algorithms often employed simple strategies,such as super-resolution methods primarily based on interpola-tion techniques.However,these methods are constrained by linear models and struggle to restore high-frequency details.Researchers have proposed more sophisticated algorithms to address the complex degradation issues in adverse scenarios.These algorithms include techniques such as histogram equalization,Retinex theory,and filtering methods.Deep neural networks have shown remarkable performance in various fields such as image classification,object detection,and facial recognition.Simultaneously,in low-level computer vision tasks such as super-resolution,style transfer,color conversion,and texture transfer,they also demonstrate excellent performance.With the continuous evolution of deep neural network frameworks,researchers have proposed diverse visual enhancement methods.The section on visual enhancement based on deep learning models takes an innovative approach to model architecture.It discusses architectures like convolutional neu-ral networks,Transformer models,and diffusion models.Unlike traditional visual enhancement,which aims to comprehen-sively improve human visual perception of images and videos,the new generation of visual enhancement and analysis meth-ods considers the interpretive performance of machine vision in degraded scenarios.The section on visual understanding technology in challenging scenarios discusses visual understanding and its corresponding datasets in challenging scenarios based on deep learning models.It also explores the collaborative computation of visual enhancement and understanding in challenging scenarios.Finally,based on the analysis,it provides prospects for the future development of visual perception and understanding in adverse scenarios.When facing complex degradation scenarios,real-world images may be simultane-ously influenced by various factors such as heavy rain and fog,dynamic changes in lighting,low-light environments,and image corruption.This condition requires models to handle unknown and diverse image features.The current challenge lies in the fact that most existing models are designed for specific degradation scenarios.This complexity introduces a signifi-cant amount of prior knowledge and causes difficulty in adapting to other degradation scenarios.The construction of exist-ing visual understanding models in adverse scenarios relies on downstream task information,including target domain data distribution,degradation priors,and pre-trained models for downstream tasks.This reliance causes difficulty in achieving robustness for arbitrary tasks and analysis models.Moreover,most methods are limited to a specific machine analysis down-stream task and cannot generalize to new downstream task scenarios.Finally,in recent years,large models have achieved significant accomplishments in various fields.Currently,many studies have demonstrated unprecedented potential for large models in tasks like enhancing reconstruction and other low-level computational visual tasks.However,the high complex-ity of large models also presents challenges,including substantial computational resource requirements,long training times,and difficulties in model optimization.At the same time,the generalization capability of models in adverse sce-narios is a pressing challenge that requires more comprehensive data construction strategies and more effective model opti-mization methods.How to improve the performance and reliability of large visual models in visual perception and under-standing tasks in adverse scenarios is a key problem that remains unsolved.
adverse scenariosvisual perceptionvisual understandingimage and video enhancementimage and video processingdeep learning