Classification network for 3D point cloud based on spatial structure convolution and attention mechanism
Objective 3D point cloud classification is a crucial task with diverse applications in computer vision,robotics,and autonomous driving.The advancement of computing device performance in recent years has enabled researchers to apply deep learning methods to the field of 3D point cloud recognition.Deep learning-based methods that are currently in use for 3D point cloud classification typically divide the feature information captured by a network into two distinct parts:global and local features.Global features refer to the overall shape and structure of the point cloud,while local features capture more detailed information about individual points.By leveraging global and local features,these methods can achieve high accuracy in point cloud classification tasks.Edge convolution(EdgeConv)is currently the most widely used method for local feature extraction in 3D point cloud classification.This method incorporates relative position vectors into feature encoding to capture the characteristics of local structures effectively.However,when local structures in 3D point clouds are similar,the use of relative positions in feature encoding may result in similar features,leading to poor classifica-tion results.Furthermore,encoding only local features may be insufficient for achieving optimal classification results,because considering the correlation between local and global features is also crucial.Current methods frequently employ attention mechanisms to learn attention scores from global features and weigh local features accordingly,effectively estab-lishing the correlation between local and global features.However,these methods may not fully consider the importance of global feature information and may suffer from suboptimal classification results.Method To address the aforementioned challenges,this study proposes a novel 3D point cloud classification network that leverages spatial structure convolution(SSConv)and attention mechanisms.The proposed network architecture consists of two parts:a local feature encoding(LFE)module and a global feature encoding(GFE)module.The former uses SSConv to encode local features from loca-tion and structure,while the latter learns global feature representation from raw coordinate data.Furthermore,to enable effective correlation and complementarity between feature information,we introduce an attention mechanism that facilitates adaptive adjustment of global features through weighted operations.The LFE module is composed of two operations:graph construction and feature extraction.The LFE module performs the K-nearest neighbor(KNN)algorithm to identify adja-cent points and construct a graph structure.SSConv is a crucial feature extraction operation that involves a multilayer per-ceptron.Compared with EdgeConv,SSConv introduces a relative position vector between adjacent points.This operation effectively increases the correlation distance between raw input data,enriches local region structure information,and enhances the spatial expression ability of the extracted high-level semantic information.To capture more effective local structure features,feature extraction is encoded separately on the basis of structure and location.In particular,the location encoding branch encodes the coordinate information separately to obtain richer location feature information for describing the spatial location of each point.Meanwhile,the structure encoding branch encodes the relative location vector separately to learn the structure information in the local region for describing the overall geometric structure of the local neighborhood.The global feature encoding module maps raw coordinate data to high-dimensional features,which are used as the global feature representation of the point cloud.In addition,the module includes an attention mechanism to enhance the correla-tion between local and global features.In particular,an attention weighting method is used to guide the learning of global feature information by using local feature information.This operation enables correlation and fusion between local and global feature representations while preserving raw feature information.Result To evaluate the performance of the proposed network model,experimental validation is conducted on the publicly available ModelNet40 dataset,which consists of 9 843 training models and 2 468 testing models in 40 classes.Classification performance was evaluated using metrics,such as overall accuracy(OA)and mean accuracy(mAcc),in the experiments.To evaluate classification performance,the pro-posed model was evaluated against four pointwise methods,two convolution-based methods,two graph convolution-based methods,and four attention mechanism-based methods.The experimental results demonstrate that the proposed network exhibits good performance in the point cloud classification task and is capable of effectively representing local and global features.The proposed method achieves an OA of 93.0%,outperforming dynamic graph convolutional neural network(DGCNN)by 0.1%,PointWeb by 0.7%,and PointCNN by 0.8%.In addition,the mAcc of the proposed method reaches 89.7%.Furthermore,an experiment was designed to validate the efficacy of SSConv.By replacing SSConv with EdgeConv in the network architecture,the experimental results indicate a reduction in OA of 0.5%on the ModelNet40 dataset,demon-strating that SSConv is better suited for local representation than EdgeConv.Meanwhile,an experiment was designed to verify the diversity of input features of SSConv.The correlation of features was evaluated using Euclidean,cosine,and corre-lation distances.The results indicate that SSConv enhances diversity among input features more effectively than EdgeConv.Furthermore,the visualization results of the intermediate layer features in the model demonstrate that SSConv can learn more distinctive features.Conclusion The proposed network model achieves better classification results,with an OA of 93.0%and an mAce of 89.7%,surpassing those of existing methods.The proposed spatially structured convolution effectively enhances the variability of input features,allowing the model to learn more diverse local feature representations of objects.The proposed global feature coding method based on the attention mechanism effectively adjusts features and fully extracts the relationship between local and global feature information while preserving global features.To summarize,the proposed network model exhibits good capability for fine-grained feature extraction and achieves good classification performance.
point cloudedge convolution(EdgeConv)spatial structureattention mechanismclassification