几何属性引导的三维语义实例重建

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：目的语义实例重建是机器人理解现实世界的一个重要问题.虽然近年来取得了很多进展,但重建性能易受遮挡和噪声的影响.特别地,现有方法忽视了物体的先验几何属性,同时忽视了物体的关键细节信息,导致重建的网格模型粗糙,精度较低.针对这种问题,提出了一种几何属性引导的语义实例重建算法.方法首先,通过目标检测器获取检测框参数,并对每个目标实例进行检测框盒采样,从而获得场景中对应的残缺局部点云.然后,通过编码器端的特征嵌入层和Transformer层提取物体丰富且关键的细节几何信息,以获取对应的局部特征,同时利用物体的先验语义信息来帮助算法更快地逼近目标形状.最后,本文设计了一种特征转换器以对齐物体全局特征,并将其与前述局部特征融合送入形状生成模块,进行物体网格重建.结果在真实数据集ScanNetv2上,本文算法与现有最新方法进行了全面的性能比较,实验结果证明了本文算法的有效性.与性能排名第2的RfD-Net相比,本算法的实例重建指标提升了 8％.此外,本文开展了详尽的消融实验以验证算法中各个模块的有效性.结论本文所提出的几何属性引导的语义实例重建算法,更好地利用了物体的几何属性信息,使得重建结果更为精细、准确.

外文标题：Geometric attribute-guided 3D semantic instance reconstruction

外文摘要：Objective The objective of 3D vision is to capture the geometric and optical features of the real world from mul-tiple perspectives and convert this information into digital form,enabling computers to understand and process it.3D vision is an important aspect of computer graphics.Nonetheless,sensors can only provide partial observations of the world due to viewpoint occlusion,sparse sensing,and measurement noise,resulting in a partial and incomplete representation of a scene.Semantic instance reconstruction is proposed to solve this problem.It converts 2D/3D data obtained from multiple sensors into a semantic representation of the scene,including modeling each object instance in the scene.Machine learn-ing and computer vision techniques are applied to achieve high-precision reconstruction results,and point cloud-based methods have demonstrated prominent advantages.However,existing methods disregard prior geometric and semantic information of objects,and the subsequent simple max-pooling operation loses key structural information of objects,result-ing in poor instance reconstruction performance.Method In this study,a geometric attribute-guided semantic instance reconstruction network(GANet),which consists of a 3D object detector,a spatial Transformer,and a mesh generator,is proposed.We design the spatial Transformer to utilize the geometric and semantic information of instances.After obtaining the 3D bounding box information of instances in the scene,box sampling is used to obtain the real local point cloud of each target instance in the scene on the basis of the instance scale information,and then semantic information is embedded for foreground point segmentation.Compared with ball sampling,box sampling reduces noise and obtains more effective infor-mation.Then,the encoder's feature embedding and Transformer layers extract rich and crucial detailed geometric informa-tion of objects from coarse to fine to obtain the corresponding local features.The feature embedding layer also utilizes the prior semantic information of objects to help the algorithm quickly approximate the target shape.The attention module in the Transformer integrates the correlation information between points.The algorithm also uses the object's global features provided by the detector.Considering the inconsistency between the scene space and the canonical space,a designed fea-ture space Transformer is used to align the object's global features.Finally,the fused features are sent to the mesh genera-tor for mesh reconstruction.The loss function of GANet consists of two parts:detection and shape losses.Detection loss is the weighted sum of the instance confidence,semantic classification,and bounding box estimation losses.Shape loss con-sists of three parts:Kullback-Leibler divergence between the predicted and standard normal distributions,foreground point segmentation loss,and occupancy point estimation loss.Occupancy point estimation loss is the cross-entropy between the predicted occupancy value of the spatial candidate points and the real occupancy value.Result The experiment was com-pared with the latest methods on the ScanNet v2 datasets.The algorithm utilized computer aided design(CAD)models pro-vided by Scan2CAD,which included 8 categories,as ground truth for training.The mean average precision of semantic instance reconstruction increased by 8％compared with the second-ranked method,i.e.,RfD-Net.The average precision of bathtub,trash bin,sofa,chair,and cabinet is better than that from RfD-Net.In accordance with the visualization results,GANet can reconstruct object models that are more in line with the scene.Ablation experiments were also con-ducted on the corresponding dataset.The performance of the complete network was better than the other four networks,which included a GANet that replaced ball sampling with box sampling,replaced the Transformer with PointNet,and removed the semantic embedding of point cloud features and feature transformation.The experimental results indicate that box sampling obtains more effective local point cloud information,the Transformer-based point cloud encoder enables the network to use more critical local structural information of the foreground point cloud during reconstruction,and semantic embedding provides prior information for instance reconstruction.Feature space transformation aligns the global prior infor-mation of an object,further improving the reconstruction effect.Conclusion In this study,we proposed a geometric attribute-guided network.This network considers the complexity of scene objects and can better utilize the geometric and attribute information of objects.The experiment results show that our network outperforms several state-of-the-art approaches.Current 3D-based semantic instance reconstruction algorithms have achieved good results,but acquiring and annotating 3D data are still relatively expensive.Future research can focus on how to use 2D data to assist in semantic instance reconstruction.

外文关键词：

scene reconstructionthree-dimensional point cloudsemantic instance reconstructionmesh generationobject detection

作者：

万骏辉、刘心溥、陈莉丽、敖晟、张鹏、郭裕兰

展开 >

作者单位：

中山大学电子与通信工程学院,深圳 518107

国防科技大学电子科学学院,长沙 410005

军事科学院国防科技创新研究院人工智能研究中心,北京 100071

关键词：

场景重建三维点云语义实例重建网格生成目标检测

基金：

国家自然科学基金项目国家自然科学基金项目广东省基础与应用基础研究基金项目广东省科技计划项目深圳市科技计划资助项目

项目编号：

U20A20185619724352022B15150201032019B121203006KQTD20190929172704911

出版年：

2024

DOI：

10.11834/jig.230106

中国图象图形学报

中国科学院遥感应用研究所,中国图象图形学学会 ,北京应用物理与计算数学研究所

中国图象图形学报

CSTPCD北大核心

影响因子：1.111

ISSN：1006-8961

年,卷(期)：2024.29(1)

参考文献量25