A survey of cross-view geo-localization methods based on deep learning
Cross-view geo-localization aims to estimate a target geographical location by matching images from different viewpoints.This method is usually viewed as an image retrieval task that has been widely adopted in various artificial intel-ligence tasks,such as person re-identification,vehicle re-identification,and image registration.The main challenge of this localization task lies in the drastic changes among different viewpoints,which reduce the retrieval performance of the model.Conventional techniques for cross-view geo-localization rely on manual feature extraction,which restricts precision when determining location.With the development of deep learning techniques,deep learning-based cross-view geo-localization methods have become the current mainstream technology.However,due to the involvement of multiple steps and the extensive transfer of knowledge in cross-view geo-localization tasks,only a few studies have been conducted in this field.In this paper,we present the first review of cross-view geo-localization methods based on deep learning.We analyze the various developments in data preprocessing,deep learning networks,feature attention modules,and loss functions within the context of cross-view geo-localization tasks.To address the challenges in this field,the data preprocessing phase involves feature alignment,sampling strategies,and data augmentation.Feature alignment serves as prior knowledge for cross-view geo-localization that contributes to improving the localization accuracy.The use of GAN networks has emerged as a prominent trend for feature alignment.Additionally,the discrepancy in sample quantities among satellite,ground,and drone images necessitates the use of effective sampling strategies and data augmentation techniques to achieve training balance.Deep learning networks play a critical role in extracting image features,and their performance directly impacts the accuracy of cross-view geo-localization tasks.In general,the methods that use Transformer as the backbone network have a higher accuracy than those that based on ResNet.Meanwhile,those methods that use the ConvNeXt network show the best performance.To further extract image features and enhance the discriminative power of the model,feature atten-tion modules need to be designed.By learning effective attention mechanisms,these modules adaptively weight the input images or feature maps to improve their focus on task-relevant regions or features.Experimental results show that these modules can explore previously unattended feature information,further extract image features,and enhance the discrimina-tive power of the model.Loss functions are used to improve the fit of the model to the data and to accelerate its conver-gence.Based on their results,these functions guide the training direction of the entire network based,thus enabling the model to learn better representations and further improve the accuracy of cross-view geo-localization tasks.Some of the most commonly used loss functions include contrastive loss and triplet loss.With the improvement in these loss functions,the number of samples extracted by the model evolves from one-to-one to one-to-many,thus allowing the model to cover all samples during training and further enhance its performance.By analyzing nearly a hundred pieces of influential literature,we summarize the characteristics and propose some ideas for improving cross-view geo-localization tasks,which can inspire researchers to design new methods.We also test 10 deep learning-based cross-view geo-localization methods on 2 represen-tative datasets.This evaluation considers the backbone network type and input data size of these methods.In the Univer-sity-1652 dataset,we evaluate the accuracy metrics R@1 and AP,the model parameters,and the inference speed.In the CVUSA dataset,we mainly evaluate four accuracy metrics,namely,R@1,R@5,R@10,and R@Top1.Experimental results show that a better backbone network and a large image data input size positively affect the performance of the model.Building upon an extensive review of the current state-of-the-art cross-view geo-localization methods,we also discuss the related challenges and provide several directions for further research on cross-view geo-localization.