提出了基于多尺度多模态自监督学习的点云分类分割网络(multi-scale and multi-modal self-supervised learning network,MultiSM-Net),融合对比学习和生成学习两种范式的优势.首先从对比学习角度,网络利用采样和投影得到多尺度点云子集、多视图图片并对其编码,基于"子集-视图"映射关系构建跨模态特征间的多尺度对比损失.进而网络从生成学习角度使用"编码-解码"模型对点云进行重建并计算得到重建损失.网络训练时加权融合两类损失,最小化不同模态的特征差异性,同时充分挖掘了点云深层语义信息.本文在ModelNet40、ShapeNetCore、ScanObjectNN上进行精度对比实验,实验结果表明,本文提出的MultiSM-Net与Point-BERT等最新的自监督方法相比,在预训练表征能力,下游分类、分割任务中的精度均有显著提升.
3D Point Cloud Classification and Segmentation Network Based on Multi-scale and Multi-modal Self-Supervised Learning
This paper proposes a point cloud classification and segmentation model(multi-scale and multi-modal self-super-vised learning network,multism-net)based on multi-scale and multi-modal self-supervised learning,which combines the advantages of the two paradigms,contrastive learning and generative learning.First,from the perspective of contrastive learning,the model uses sampling and projection to obtain multi-scale point cloud subsets and multi-view images and en-code them,and builds multi-scale contrastive loss between cross-modal features based on the subset-view mapping rela-tionship.Furthermore,the model uses the encoding-decoding model to reconstruct the point cloud from the perspective of generative learning and establishes the reconstruction loss.During training,the two losses are weighted and fused to min-imize the feature differences of different modalities while min-ing the deep semantic information of the point cloud.This pa-per conducts experiments on ModelNet40,ShapeNetCore,and ScanObjectNN.Experimental results show that com-pared with latest self-supervised models like Point-BERT,the MultiSM-Net proposed has significantly improved pre-training representation capabilities and accuracy in down-stream classification and segmentation tasks.
point cloud classificationpoint cloud segmentationmulti-modal dataself-supervised learning