基于伪文本查询生成及位置感知的弱监督3D视觉定位方法

Weakly supervised 3D visual grounding based on pseudo-text query generation and position awareness

张宇琦 ¹罗寒 ¹杨昱威 ¹金钊 ¹严华¹

扫码查看

作者信息

1. 四川大学电子信息学院,成都 610065
折叠

摘要

3D点云视觉定位在自动驾驶、VR/AR等应用中发挥着重要作用.现有大部分点云视觉定位方法依赖对每个目标定位物体的精细人工描述,耗时耗力.为克服视觉语言任务对文本标注的依赖性,现有研究已提出伪文本生成和特征替换方法,在2D领域实现无需文本标注的视觉定位、图像编辑等.在对2D方法研究的基础上,提出了一种自动生成伪文本并实现位置感知的弱监督3D视觉定位方法.在公开数据集ScanRefer、Nr3D/Sr3D上的实验证明了所提方法的有效性和优越性能.

Abstract

3D point cloud visual grounding plays a significant role in applications such as autonomous driving,VR/AR,and more.Most existing point cloud visual grounding methods rely on detailed manual descriptions for each target object,which is time-consuming and labor-intensive.To overcome the dependency on textual annotations in visual-language tasks,existing research has introduced methods of pseudo-text generation and feature replacement,achieving text-free visual localization and image editing in the 2D domain.Building upon the research in 2D methods,this paper proposes a weakly supervised 3D visual localization method that automatically generates pseudo-text and realizes position awareness.Experiments conducted on public datasets such as ScanRefer and Nr3D/Sr3D have demonstrated the effectiveness and superior performance of the proposed method.

关键词

弱监督学习/3D点云/3D视觉定位/位置感知/伪文本生成

Key words

weakly supervised learning/3D point cloud/3D visual grounding/position awareness/pseudo-text generation

引用本文复制引用

出版年

2024

现代计算机

中大控股

现代计算机

影响因子：0.292

ISSN：1007-1423

段落导航