Weakly supervised 3D visual grounding based on pseudo-text query generation and position awareness
3D point cloud visual grounding plays a significant role in applications such as autonomous driving,VR/AR,and more.Most existing point cloud visual grounding methods rely on detailed manual descriptions for each target object,which is time-consuming and labor-intensive.To overcome the dependency on textual annotations in visual-language tasks,existing research has introduced methods of pseudo-text generation and feature replacement,achieving text-free visual localization and image editing in the 2D domain.Building upon the research in 2D methods,this paper proposes a weakly supervised 3D visual localization method that automatically generates pseudo-text and realizes position awareness.Experiments conducted on public datasets such as ScanRefer and Nr3D/Sr3D have demonstrated the effectiveness and superior performance of the proposed method.
weakly supervised learning3D point cloud3D visual groundingposition awarenesspseudo-text generation