Visual-based localization determines the camera translation and orientation of an image observation with respect to a prebuilt 3D-based representation of the environment.It is an essential technology that empowers the intelligent interac-tions between computing facilities and the real world.Compared with alternative positioning systems beyond,the capability to estimate the accurate 6DOF camera pose,along with the flexibility and frugality in deployment,positions visual-based localization technology as a cornerstone of many applications,ranging from autonomous vehicles to augmented and mixed reality.As a long-standing problem in computer vision,visual localization has made exceeding progress over the past decades.A primary branch of prior arts relies on a preconstructed 3D map obtained by structure-from-motion techniques.Such 3D maps,a.k.a.SfM point clouds,store 3D points and per-point visual features.To estimate the camera pose,these methods typically establish correspondences between 2D keypoints detected in the query image and 3D points of the SfM point cloud through descriptor matching.The 6DOF camera pose of the query image is then recovered from these 2D-3D matches by leveraging geometric principles introduced by photogrammetry.Despite delivering fairly sound and reliable performance,such a scheme often has to consume several gigabytes of storage for just a single scene,which would result in computationally expensive overhead and prohibitive memory footprint for large-scale applications and resource-intensive platforms.Furthermore,it suffers from other drawbacks,such as costly map maintenance and privacy vulnerability.The aforementioned issues pose a major bottleneck in real-world applications and have thus prompted researchers to shift their focus toward leaner solutions.Lightweight visual-based localization seeks to introduce improvements in scene representa-tions and the associated localization methods,making the resulting framework computationally tractable and memory-efficient without incurring a notable performance expense.For the background,this literature review first introduces sev-eral flagship frameworks of the visual-based localization task as preliminaries.These frameworks can be broadly classified into three categories,including image-retrieval-based methods,structure-based methods,and hierarchical methods.3D scene representations adopted in these conventional frameworks,such as reference image databases and SfM point clouds,generally exhibit a high degree of redundancy,which causes excessive memory usage and inefficiency in distinguishing scene features for descriptor matching.Next,this review provides a guided tour of recent advances that promote the brevity of the 3D scene representations and the efficiency of corresponding visual localization methods.From the perspective of scene representations,existing research efforts in lightweight visual localization can be classified into six categories.Within each category,this literature review analyzes its characteristics,application scenarios,and technical limitations while also surveying some of the representative works.First,several methods have been proposed to enhance memory effi-ciency by compressing the SfM point clouds.These methods reduce the size of SfM point clouds through the combination of techniques including feature quantization,keypoint subset sampling,and feature-free matching.Extreme compression rates,such as 1%and below,can be achieved with barely noticeable accuracy degradation.Employing line maps as scene representations has become a focus of research in the field of lightweight visual localization.In human-made scenes charac-terized by salient structural features,the substitution of line maps for point clouds offers two major merits:1)the abun-dance and rich geometric properties of line segments make line maps a concise option for depicting the environment;2)line features exhibit better robustness in weak-textured areas or under temporally varying lighting conditions.However,the lack of a unified line descriptor and the difficulty of establishing 2D-3D correspondences between 3D line segments and image observations remain as main challenges.In the field of autonomous driving,high-definition maps constructed from vectorized semantic features have unlocked a new wave of cost-effective and lightweight solutions to visual localization for self-driving vehicle.Recent trends involve the utilization of data-driven techniques to learn to localize.This end-to-end phi-losophy has given rise to two regression-based methods.Scene coordinate regression(SCR)methods eschew the explicit processes of feature extraction and matching.Instead,they establish a direct mapping between observations and scene coor-dinates through regression.While a grounding in geometry remains essential for camera pose estimation in SCR methods,pose regression methods employ deep neural networks to establish the mapping from image observations to camera poses without any explicit geometric reasoning.Absolute pose regression techniques are akin to image retrieval approaches with limited accuracy and generalization capability,while relative pose regression techniques typically serve as a postprocessing step following the coarse localization stage.Neural radiance fields and related volumetric-based approaches have emerged as a novel way for the neural implicit scene representation.While visual localization based solely on a learned volumetric-based implicit map is still in an exploratory phase,the progress made over the past year or two has already yielded an impressive performance in terms of the scene representation capability and precision of localization.Furthermore,this study quantitatively evaluates the performance of several representative lightweight visual localization methods on well-known indoor and outdoor datasets.Evaluation metrics,including offline mapping time usage,storage demand,and local-ization accuracy,are considered for making comparisons.Results reveal that SCR methods generally stand out among the existing work,boasting remarkably compact scene maps and high success rates of localization.Existing lightweight visual localization methods have dramatically pushed the performance boundary.However,challenges still remain in terms of scalability and robustness when enlarging the scene scale and taking considerable visual disparity between query and map-ping images into consideration.Therefore,extensive efforts are still required to promote the compactness of scene represen-tations and improving the robustness of localization methods.Finally,this review provides an outlook on developing trends in the hope of facilitating future research.