Pure camera-based bird's-eye-view perception in vehicle side and infrastructure side:a review
As a key technology for 3D perception in the autonomous driving domain,pure camera-based bird's-eye-view(BEV)perception aims to generate a top-down view representation of the surrounding traffic environment using only 2D image information captured by cameras.In recent years,it has gained considerable attention in the computer vision research community.The potential of BEV is immense because it can represent image features from multiple camera view-points in a unified space and provide explicit position and size information of the target object.While most BEV methods focus on developing perception methods on ego-vehicle sensors,people have gradually realized the importance of using intelligent roadside cameras to extend the perception ability beyond the visual range in recent years.However,this novel and growing research field has not been reviewed recently.This paper presents a comprehensive review of pure camera-based BEV perception technology based on camera deployment and camera angle,which are segmented into three catego-ries:1)vehicle-side single-view perception,2)vehicle-side surround-view perception,and 3)infrastructure-side fixed-view perception.Meanwhile,the typical processing flow,which contains three primary parts:dataset input,BEV model,and task inference output,is introduced.In the task inference output section,four typical tasks in the 3D perception of autonomous driving(i.e.,3D object detection,3D lane detection,BEV map segmentation,and high-definition map gen-eration)are described in detail.For supporting convenient retrieval,this study summarizes the supported tasks and official links for various datasets and provides open-source code links for representative BEV models in a table format.Simultane-ously,the performance of various BEV models on public datasets is analyzed and compared.To our best knowledge,three types of BEV challenging problems must be resolved:1)scene uncertainty problems:In an open-road scenario,many scenes never appear in the training dataset.These scenarios can include extreme weather conditions,such as dark nights,strong winds,heavy rain,and thick fog.A model's reliability must not degrade in these unusual circumstances.However,majority of BEV models tend to suffer from considerable performance degradation when exposed to varying road scenarios.2)Scale uncertainty problems:autonomous driving perception tasks have many extreme scale targets.For example,in a roadside scenario,placing a camera on a traffic signal or streetlight pole at least 3 m above the ground can help detect far-ther targets.However,facing the extremely small scale of the distant targets,existing BEV models have serious issues with false and missed detections.3)Camera parameter sensitivity problems:most existing BEV models depend on precisely calibrated intrinsic and extrinsic camera parameters for their success during training and evaluation.The performance of these methods drastically diminishes if noisy extrinsic camera parameters are utilized or unseen intrinsic camera parameters are inputted.Meanwhile,a comprehensive outlook on the development of pure camera-based BEV perception is given:1)vehicle-to-infrastructure(V2I)cooperation:V2I cooperation refers to the integration of information from vehicle-side and infrastructure-side to achieve the visual perception tasks of autonomous driving under communication bandwidth con-straints.The design and implementation of a vehicle-infrastructure integration perception algorithm can lead to remarkable benefits,such as supplementing blind spots,expanding the field of view,and improving perception accuracy.2)Vehicle-to-vehicle(V2V)cooperation:V2V cooperation means that connected autonomous vehicles(CAVs)can share the col-lected data with each other under communication bandwidth constraints.CAVs can collaborate to compensate for the short-age of data and expand view for vehicles in need,thereby augmenting perception capabilities,boosting detection accuracy,and improving driving safety.3)Multitask learning:the purpose of multitask learning is to optimize multiple tasks at the same time to improve the efficiency and performance of algorithms,simplifying the complexity of models.In BEV models,the generated BEV features are friendly to many downstream tasks,such as 3D object detection and BEV map segmenta-tion.Sharing models can largely increase the parameter sharing rate,save computing costs,reduce training time,and improve model generalization performance.The objective of these endeavors is to provide a comprehensive guide and refer-ence for researchers in related fields by thoroughly summarizing and analyzing existing research and future trends in the field of pure camera-based BEV perception.