Multi-scale Transformer for View-based 3D Shape Analysis
View-based 3D shape analysis is a crucial research domain within the field of 3D computer vision.Those techniques aim to recognise and retrieve 3D objects by aggregating features extracted from 2D images of the same object taken from different viewpoints.Howev-er,effectively exploring the relationships between different viewpoints and aggregating features from multiple viewpoints using these relationships remain fundamental challenges in the field of 3D shape analysis.Taking inspiration from the recent success of Transformer networks in modeling relationships,an novel multi-scale Transformer architecture is introduced and the Multi-View Multi-Scale Transformer(MVMST)is presented for three-dimensional shape anal-ysis.MVMST efficiently learns relationships between different views and integrates features from multi-view images into a global descriptor.While previous approaches use a Transformer with a global receptive field to model the relationships between multi-view features,MVMST makes use of multi-scale learning.A multi-scale Transformer is used to model the relation-ships between multi-view features at different scales.In addition,a multi-scale fusion module is designed to merge the features processed by the multi-scale Transformer to obtain a more efficient multi-scale representation.With the view pooling module,these multi-scale represen-tations from different views are eventually fused into a global descriptor of the 3D shape.The experiments on synthetic and real-world 3D object classification datasets demonstrate that the proposed method shows promising performance in 3D object classification tasks.
3D shape recognitionTransformermulti-scale learning