首页|基于伪文本查询生成及位置感知的弱监督3D视觉定位方法

基于伪文本查询生成及位置感知的弱监督3D视觉定位方法

扫码查看
3D点云视觉定位在自动驾驶、VR/AR等应用中发挥着重要作用.现有大部分点云视觉定位方法依赖对每个目标定位物体的精细人工描述,耗时耗力.为克服视觉语言任务对文本标注的依赖性,现有研究已提出伪文本生成和特征替换方法,在2D领域实现无需文本标注的视觉定位、图像编辑等.在对2D方法研究的基础上,提出了一种自动生成伪文本并实现位置感知的弱监督3D视觉定位方法.在公开数据集ScanRefer、Nr3D/Sr3D上的实验证明了所提方法的有效性和优越性能.
Weakly supervised 3D visual grounding based on pseudo-text query generation and position awareness
3D point cloud visual grounding plays a significant role in applications such as autonomous driving,VR/AR,and more.Most existing point cloud visual grounding methods rely on detailed manual descriptions for each target object,which is time-consuming and labor-intensive.To overcome the dependency on textual annotations in visual-language tasks,existing research has introduced methods of pseudo-text generation and feature replacement,achieving text-free visual localization and image editing in the 2D domain.Building upon the research in 2D methods,this paper proposes a weakly supervised 3D visual localization method that automatically generates pseudo-text and realizes position awareness.Experiments conducted on public datasets such as ScanRefer and Nr3D/Sr3D have demonstrated the effectiveness and superior performance of the proposed method.

weakly supervised learning3D point cloud3D visual groundingposition awarenesspseudo-text generation

张宇琦、罗寒、杨昱威、金钊、严华

展开 >

四川大学电子信息学院,成都 610065

弱监督学习 3D点云 3D视觉定位 位置感知 伪文本生成

2024

现代计算机
中大控股

现代计算机

影响因子:0.292
ISSN:1007-1423
年,卷(期):2024.30(11)