语义增强图像-文本预训练模型的零样本三维模型分类

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：目前,基于对比学习的图像-文本预训练模型(CLIP)在零样本3维模型分类任务上表现出了巨大潜力,然而3维模型和文本之间存在巨大的模态鸿沟,影响了分类准确率的进一步提高.针对以上问题,该文提出一种语义增强CLIP的零样本3维模型分类方法.该方法首先将3维模型表示成多视图;然后为了增强零样本学习对未知类别的识别能力,通过视觉语言生成模型获得每张视图及其类别的语义描述性文本,并将其作为视图和类别提示文本之间的语义桥梁,语义描述性文本采用图像字幕和视觉问答两种方式获取;最后微调语义编码器将语义描述性文本具化为类别的语义描述,其拥有丰富的语义信息和较好的可解释性,有效减小了视图和类别提示文本的语义鸿沟.实验表明,该文方法在ModelNet10和ModelNet40数据集上的分类性能优于现有的零样本分类方法.

外文标题：Zero-shot 3D Shape Classification Based on Semantic-enhanced Language-Image Pre-training Model

外文摘要：Currently,the Contrastive Language-Image Pre-training(CLIP)has shown great potential in zero-shot 3D shape classification.However,there is a large modality gap between 3D shapes and texts,which limits further improvement of classification accuracy.To address the problem,a zero-shot 3D shape classification method based on semantic-enhanced CLIP is proposed in this paper.Firstly,3D shapes are represented as views.Then,in order to improve recognition ability of unknown categories in zero-shot learning,the semantic descriptive text of each view and its corresponding category are obtained through a visual language generative model,and it is used as the semantic bridge between views and category prompt texts.The semantic descriptive texts are obtained through image captioning and visual question answering.Finally,the finely-adjusted semantic encoder is used to concretize the semantic descriptive texts to the semantic descriptions of each category,which have rich semantic information and strong interpretability,and effectively reduce the semantic gap between views and category prompt texts.Experiments show that our method outperforms existing zero-shot classification methods on the ModelNet10 and ModelNet40 datasets.

外文关键词：

3D shape classificationZero-shotContrastive Language-Image Pre-training(CLIP)Semantic descriptive text

作者：

丁博、张立宝、秦健、何勇军

展开 >

作者单位：

哈尔滨理工大学计算机科学与技术学院哈尔滨 150080

哈尔滨工业大学计算学部哈尔滨 150006

关键词：

3维模型分类零样本基于对比学习的图像-文本预训练模型语义描述性文本

基金：

国家自然科学基金黑龙江省自然科学基金黑龙江省自然科学基金

项目编号：

61673142LH2022F029JQ2019F002

出版年：

2024

DOI：

10.11999/JEIT231161

电子与信息学报

中国科学院电子学研究所国家自然科学基金委员会信息科学部

电子与信息学报

CSTPCD北大核心

影响因子：1.302

ISSN：1009-5896

年,卷(期)：2024.46(8)