跨视角图像地理定位方法综述

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：跨视角图像地理定位旨在通过图像匹配和地理坐标估计实现不同视角图像之间的准确对应和地理定位,广泛应用于机器人导航、自动驾驶和三维重建等领域.传统的单一视角图像地理定位方法通常受限于数据集质量和规模等因素,定位精度较低.为克服这些局限,近年来研究人员提出了一系列跨视角图像地理定位方法,同时利用多个视角的图像数据,通过视角比较和匹配提高定位精度.跨视角图像匹配方法呈现多元的分类体系.根据面向的跨视角图像类型的不同,可将其分为面向地面—卫星图像的方法与面向无人机—卫星图像的方法两类.根据图像特征提取与表达方式的不同,又可将其分为基于人工设计特征的方法与基于深度神经网络自学习特征的方法两类,对于后者,还可根据是否采用视角对齐方法以及所采用对齐方法的不同将其细分为无视角对齐处理的跨视角图像地理定位、基于传统图像变换的跨视角图像地理定位和基于图像生成的跨视角图像地理定位等3类.本综述对以上方法进行了介绍并比较了它们的优缺点;此外,还总结了常用于跨视角图像地理定位的数据集和评价方法;最后,展望了跨视角图像地理定位的应用领域和未来发展方向.尽管跨视角地理定位方法已取得突破和进展,但仍面临一些问题和挑战.因此,本综述提出了可能的解决方向和未来研究的重点,以期推动该领域的发展和创新.

外文标题：Review of cross-view image geolocalization methods

外文摘要：The research field of cross-view image geolocalization aims to determine the geographic location of images obtained from various viewpoints or perspectives to provide technical support for subsequent tasks,such as automatic driv-ing,robot navigation,and three-dimensional reconstruction.This field involves matching images captured from different views,such as satellite and ground-level images,to accurately estimate their geographical coordinates.Cross-view image geolocalization presents difficulty due to differences in viewpoint,scale,illumination,and appearance among images.This process requires addressing the problems of viewpoint variation,geometric transformations,and handling the large search space of possible matching locations.Early studies on image geolocalization were mainly based on single-view images.Single-view image geolocalization can obtain the geolocation information of a given image by searching for the same-view reference image with prelabeled geolocation information from the image database.However,the traditional single-view image geolocalization method is usually limited by the quality and scale of the dataset,and thus,the positioning accu-racy is usually low.To overcome these limitations,the researchers have proposed a series of cross-view image geolocaliza-tion methods that utilize image data from multiple perspectives to increase the positioning accuracy through the comparison and matching various perspectives.Given the complexity of geolocalization tasks and solutions,existing methods of cross-view image geolocalization can be classified in multiple ways.This review introduces various classification methods of cross-view image geolocalization and representative methods for each type,and compares their advantages and disadvan-tages.On the one hand,the diversification of platforms and the increase in multisource data provide more source data choices for cross-view image geolocalization.Based on the differences in matching image sources,cross-view image geolo-calization methods can be classified into ground-satellite image-and drone-satellite image-oriented methods.Ground-satellite image-oriented geolocalization conducts image geolocalization on a satellite image based on a ground-view image to be queried.Although ground-satellite geolocalization has various application prospects,a huge visual difference exists between ground-and satellite-view images due to the large angle change,and thus,the matching task encounters diffi-culty.The drone-satellite geolocalization task,despite being a relatively new method of cross-view image geolocalization,is receiving increasing attention.Unlike the ground image,the drone experiences less occlusion,covers more scenes,and is found near the satellite perspective.The release of University-1652,a geolocalization dataset containing drone,ground,and satellite images,provides data support for related research.On the other hand,feature extraction can be used to solve the geographic location problem of horizontal images.Based on the diverse methods of image feature extraction and expres-sion,cross-view image geolocation methods can be classified into those that are based on artificially designed features and those based on self-learning features of deep neural networks.The former mainly comprise methods based on hand-crafted feature descriptors,such as scale-invariant feature transform,speeded-up robust features,and oriented FAST and rotated BRIEF,which can often be used for similarity measurement using Euclidean or cosine distance or be directly inputted into machine learning models,such as support vector machines and random forest models.Nevertheless,methods belonging to this category exhibit a weak robustness,cannot be finetuned for specific tasks,and have limited accuracy.With the rise of deep learning and the release of large annotated datasets,such as CVUSA and CVACT,deep neural networks have been applied to cross-view image geolocation.Based on whether view alignment is incorporated and the manner of its implemen-tation,methods based on self-learning features of deep neural networks can be subdivided into three categories,namely,those without view alignment processing,those with a view alignment based on traditional image transformations,and those with a view alignment based on image generation.Methods without a view alignment processing focus on end-to-end learn-ing of image feature representation with sufficient discriminative capability,and deep neural networks are mainly based on convolutional neural networks and attention mechanisms.This kind of method is dedicated to making full use of content information in images but often ignores the spatial relationship between images of different views(such as ground and aerial views).This defect is compensated by methods with view alignment based on traditional image transformations.Traditional image-transforming methods were used to explicitly provide additional spatial information for input images,which narrows the domain gap between cross-view images.This kind of method includes polar coordinate transformation and perspective image transformation.Methods with view alignment based on image generation usually utilize generative neural networks first to generate image samples with realistic view angles and match these generated images with real ones to infer their cor-responding geographical positions.The generative adversarial network is a representative method in this category.Apart from the description and categorization of methods,the commonly used datasets,including CVUSA,CVACT,and VIGOR for street view-satellite image matching,University-1652 for ground-drone-satellite image matching,and SUES-200 for drone-satellite image matching,and their characteristics for cross-view image geolocalization are summarized.In addition,this paper summarizes the commonly used metrics for model performance evaluation,including Recall@K,average preci-sion(AP),and Hit Rate-K.The evaluation was based on the performances of CVUSA,CVACT,and University-1625.Finally,this review offers an view on the application areas and future development directions of cross-view image geolocal-ization.Although this research field has achieved considerable breakthroughs and progress,it still faces certain obstacles and challenges,such as the lack of multimodal datasets,challenges in nonrigid scenarios,and the need for real-time and online geolocation.Possible solutions and future research priorities have been proposed to further promote the development and innovation shown in this field.Such solutions include the creation of multimode geolocalization datasets,combination of multiscale and multiview information to solve the geo-location problem in nonrigid scenes,and fusion of other sensor data to achieve real-time geolocation.

外文关键词：

image geolocalizationcross-viewimage matchingdeep learningrepresentation learningperspective transformation

作者：

盛怡宁、赵理君、张正、崔绍龙、饶梦彬、唐娉

展开 >

作者单位：

中国科学院空天信息创新研究院,北京 100094

中国科学院大学电子电气与通信工程学院,北京 100049

中国电子科技集团公司信息科学研究院,北京 100043

关键词：

图像地理定位跨视角图像匹配深度学习表征学习视角转换

基金：

中国科学院空天信息创新研究院"未来之星"人才计划项目中国科学院空天信息创新研究院"未来之星"人才计划项目中国科学院青年创新促进会项目

项目编号：

2020KTYWLZX032021KTYWLZX072022127

出版年：

2024

DOI：

10.11834/jig.230585

中国图象图形学报

中国科学院遥感应用研究所,中国图象图形学学会 ,北京应用物理与计算数学研究所

中国图象图形学报

CSTPCD北大核心

影响因子：1.111

ISSN：1006-8961

年,卷(期)：2024.29(9)