首页|深度学习的跨视角地理定位方法综述

深度学习的跨视角地理定位方法综述

扫码查看
跨视角地理定位技术是计算机视觉领域中的重要问题之一,因其可在缺乏卫星定位环境中实现实时定位,一直受到图像配准、导航定位和图像检索等诸多领域的关注.传统的跨视角地理定位方法采用手工特征进行特征抽取,导致定位精度受限.随着深度学习技术的发展,深度学习的跨视角地理定位方法成为当前的主流技术.但由于跨视角地理定位任务涉及多个步骤、迁移知识广泛,因此本领域仍缺少相关综述.本文首次从跨视角地理定位任务框架的视角,对当前深度学习的跨视角地理定位方法进行全面综述.在问题概述的基础上,对数据预处理、深度学习网络、特征注意力模块和损失函数等技术的发展进行了归纳总结.通过对近百篇高影响力文献的梳理,本文总结出跨视角地理定位任务的特性和改进思路,有助于启发研究者设计新方法.此外,还在两个具有代表性的数据集上分别测试了 10种不同深度学习的跨视角地理定位方法.从实验精度、模型的参数量和推理速度3个方面综合评估了现有方法的性能.最后,基于对上述跨视角地理定位方法的归纳分析,本文结合实际应用指出该领域存在的一些问题,并对未来发展趋势进行讨论,希望为该领域感兴趣的学者提供参考.
A survey of cross-view geo-localization methods based on deep learning
Cross-view geo-localization aims to estimate a target geographical location by matching images from different viewpoints.This method is usually viewed as an image retrieval task that has been widely adopted in various artificial intel-ligence tasks,such as person re-identification,vehicle re-identification,and image registration.The main challenge of this localization task lies in the drastic changes among different viewpoints,which reduce the retrieval performance of the model.Conventional techniques for cross-view geo-localization rely on manual feature extraction,which restricts precision when determining location.With the development of deep learning techniques,deep learning-based cross-view geo-localization methods have become the current mainstream technology.However,due to the involvement of multiple steps and the extensive transfer of knowledge in cross-view geo-localization tasks,only a few studies have been conducted in this field.In this paper,we present the first review of cross-view geo-localization methods based on deep learning.We analyze the various developments in data preprocessing,deep learning networks,feature attention modules,and loss functions within the context of cross-view geo-localization tasks.To address the challenges in this field,the data preprocessing phase involves feature alignment,sampling strategies,and data augmentation.Feature alignment serves as prior knowledge for cross-view geo-localization that contributes to improving the localization accuracy.The use of GAN networks has emerged as a prominent trend for feature alignment.Additionally,the discrepancy in sample quantities among satellite,ground,and drone images necessitates the use of effective sampling strategies and data augmentation techniques to achieve training balance.Deep learning networks play a critical role in extracting image features,and their performance directly impacts the accuracy of cross-view geo-localization tasks.In general,the methods that use Transformer as the backbone network have a higher accuracy than those that based on ResNet.Meanwhile,those methods that use the ConvNeXt network show the best performance.To further extract image features and enhance the discriminative power of the model,feature atten-tion modules need to be designed.By learning effective attention mechanisms,these modules adaptively weight the input images or feature maps to improve their focus on task-relevant regions or features.Experimental results show that these modules can explore previously unattended feature information,further extract image features,and enhance the discrimina-tive power of the model.Loss functions are used to improve the fit of the model to the data and to accelerate its conver-gence.Based on their results,these functions guide the training direction of the entire network based,thus enabling the model to learn better representations and further improve the accuracy of cross-view geo-localization tasks.Some of the most commonly used loss functions include contrastive loss and triplet loss.With the improvement in these loss functions,the number of samples extracted by the model evolves from one-to-one to one-to-many,thus allowing the model to cover all samples during training and further enhance its performance.By analyzing nearly a hundred pieces of influential literature,we summarize the characteristics and propose some ideas for improving cross-view geo-localization tasks,which can inspire researchers to design new methods.We also test 10 deep learning-based cross-view geo-localization methods on 2 represen-tative datasets.This evaluation considers the backbone network type and input data size of these methods.In the Univer-sity-1652 dataset,we evaluate the accuracy metrics R@1 and AP,the model parameters,and the inference speed.In the CVUSA dataset,we mainly evaluate four accuracy metrics,namely,R@1,R@5,R@10,and R@Top1.Experimental results show that a better backbone network and a large image data input size positively affect the performance of the model.Building upon an extensive review of the current state-of-the-art cross-view geo-localization methods,we also discuss the related challenges and provide several directions for further research on cross-view geo-localization.

cross-viewgeo-localizationimage retrievaldeep learningattentiondrone

周博文、李阳、马鑫骥、苗壮、张睿

展开 >

陆军工程大学指挥控制工程学院,南京 210007

跨视角 地理定位 图像检索 深度学习 注意力 无人机

2024

中国图象图形学报
中国科学院遥感应用研究所,中国图象图形学学会 ,北京应用物理与计算数学研究所

中国图象图形学报

CSTPCD北大核心
影响因子:1.111
ISSN:1006-8961
年,卷(期):2024.29(12)