基于多尺度Transformer的多视图三维形状分析方法

Multi-scale Transformer for View-based 3D Shape Analysis

卫鑫 ¹孙剑¹

扫码查看

作者信息

1. 西安交通大学数学与统计学院,西安 710049
折叠

摘要

基于多视图的三维形状分析方法是三维计算机视觉领域中的重要研究分支,通过整合三维形状在多个视角下的二维图像的特征来完成三维形状的识别、检索等任务.然而,如何有效地探索不同视角之间的关联性,并运用这些关联性来聚合多视图图像的特征仍然是三维形状分析中一个亟待解决的核心问题.受到最近兴起的Transformer网络在关系建模问题上成功应用的启发,研究工作引入了一种创新的多尺度Transformer架构,提出了基于多尺度Transformer的多视图三维形状分析方法(Multi-View Multi-Scale Transformer,MVMST).此方法能够有效地学习不同视角之间的关联性,将多视图图像的特征聚合为一个具有强大表达能力的整体描述符.与以往方法使用感受野为全局的Transformer建模多视图特征的关系不同,该方法受到多尺度学习方法的启发,使用多尺度的Transformer来建模不同尺度下的多视图图像特征之间的关系,并设计了一个多尺度融合模块将多个尺度下经过Transformer处理的特征进行融合,得到一个相比单一尺度更加有效的多尺度表示.多个视图的多尺度表示最终经过视角池化模块融合成三维形状的一个整体描述符.研究了在多个合成和真实扫描三维形状分类数据集上进行了实验,结果表明所提出的方法在三维形状分类任务上表现出令人满意的性能.

Abstract

View-based 3D shape analysis is a crucial research domain within the field of 3D computer vision.Those techniques aim to recognise and retrieve 3D objects by aggregating features extracted from 2D images of the same object taken from different viewpoints.Howev-er,effectively exploring the relationships between different viewpoints and aggregating features from multiple viewpoints using these relationships remain fundamental challenges in the field of 3D shape analysis.Taking inspiration from the recent success of Transformer networks in modeling relationships,an novel multi-scale Transformer architecture is introduced and the Multi-View Multi-Scale Transformer(MVMST)is presented for three-dimensional shape anal-ysis.MVMST efficiently learns relationships between different views and integrates features from multi-view images into a global descriptor.While previous approaches use a Transformer with a global receptive field to model the relationships between multi-view features,MVMST makes use of multi-scale learning.A multi-scale Transformer is used to model the relation-ships between multi-view features at different scales.In addition,a multi-scale fusion module is designed to merge the features processed by the multi-scale Transformer to obtain a more efficient multi-scale representation.With the view pooling module,these multi-scale represen-tations from different views are eventually fused into a global descriptor of the 3D shape.The experiments on synthetic and real-world 3D object classification datasets demonstrate that the proposed method shows promising performance in 3D object classification tasks.

关键词

三维形状分析/Transformer/多尺度方法

Key words

3D shape recognition/Transformer/multi-scale learning

引用本文复制引用

基金项目

国家自然科学基金(12125104)

出版年

2024

工程数学学报

西安交通大学

工程数学学报

CSTPCD北大核心

影响因子：0.302

ISSN：1005-3085

参考文献量36

段落导航