T-graph: Enhancing sparse-view camera pose estimation by pairwise translation graph

Xian Q. ¹van der Zwaag B.J. ¹Huang Y. ¹Jiao W. ²Cheng H.²

扫码查看

作者信息

1. Pervasive Systems Research Group Faculty of Electrical Engineering Mathematics and Computer Science University of Twente
2. Department of Earth Observation Science Faculty of Geo-Information Science and Earth Observation (ITC) University of Twente
折叠

Abstract

© 2025 The AuthorsSparse-view camera pose estimation, which aims to recover 6-Degree-of-Freedom (6-DoF) poses from a limited number of unordered multi-view images, is fundamental yet challenging in remote sensing. Learning-based methods offer greater robustness than traditional Structure-from-Motion (SfM) pipelines by leveraging dense high-dimensional features and implicit learning, rather than sparse keypoints and limited geometric constraints. However, they often neglect pairwise translation cues between views, resulting in suboptimal performance in sparse-view scenarios. To address this limitation, we introduce T-Graph, a lightweight, plug-and-play module to enhance camera pose estimation in sparse-view settings. T-graph takes paired image features as input and maps them through a Multilayer Perceptron (MLP). It then constructs a fully connected translation graph, where nodes represent cameras and edges encode their translation relationships. It can be seamlessly integrated into most existing learning-based models as an additional branch in parallel with the original prediction, maintaining efficiency and ease of use. Furthermore, we introduce two pairwise translation representations, relative-t and pair-t, formulated under different local coordinate systems. While relative-t captures intuitive spatial relationships, pair-t offers a rotation-disentangled alternative. The two representations contribute to enhanced adaptability across diverse application scenarios, further improving our module's robustness. We further propose an indicator termed the Camera Axis Dispersion Ratio (CADR) to quantitatively assess which type of pairwise translation representation is better suited for a given camera configuration in a dataset. Extensive experiments on three representative methods (RelPose++, Forge and 8Pt-ViT) using public datasets (CO3D and IMC PhotoTourism) validate both the effectiveness and generalizability of T-Graph. The results demonstrate consistent improvements across various metrics, notably camera center accuracy, which improves up to 6% across 2 to 8 viewpoints.

Key words

Camera pose estimation/Pairwise translation representation/Sparse-view scenario

引用本文复制引用

出版年

2025

ISPRS journal of photogrammetry and remote sensing

ISSN：0924-2716

参考文献量54

段落导航