3D point cloud visual grounding plays a significant role in applications such as autonomous driving,VR/AR,and more.Most existing point cloud visual grounding methods rely on detailed manual descriptions for each target object,which is time-consuming and labor-intensive.To overcome the dependency on textual annotations in visual-language tasks,existing research has introduced methods of pseudo-text generation and feature replacement,achieving text-free visual localization and image editing in the 2D domain.Building upon the research in 2D methods,this paper proposes a weakly supervised 3D visual localization method that automatically generates pseudo-text and realizes position awareness.Experiments conducted on public datasets such as ScanRefer and Nr3D/Sr3D have demonstrated the effectiveness and superior performance of the proposed method.
关键词
弱监督学习/3D点云/3D视觉定位/位置感知/伪文本生成
Key words
weakly supervised learning/3D point cloud/3D visual grounding/position awareness/pseudo-text generation