Point-Voxel Consistency Constraint Network for LiDAR Point Cloud Classification Under Urban Scenes
Objective Accurate and efficient point cloud classification plays a vital role in tasks such as scene understanding and digital twin city classification.Traditional classification methods manually extract features and construct discriminative models to classify point clouds.However,with the increasing density of point cloud acquisition and growth in data volume,it is difficult for traditional methods to achieve accurate and efficient point cloud classification.Recently developed deep learning-based point cloud processing methods promote the development of point cloud classification.Among them,methods using visual structural data,such as unique points or voxels,are prone to losing critical geometric features,whereas methods fusing multiple structural data can learn multilevel and multiscale features of different data.However,it is difficult to balance the differences between various data,which reduces the accuracy of point cloud classification.In addition,LiDAR point clouds acquired from complex urban scenes contain large amounts of noise and outliers that are difficult to process.These challenges have become a problem to be solved in current point cloud classification research.Methods To address these problems,a point-voxel consistency constraint network(PVCC-Net)is proposed to accurately segment point clouds with different sizes in urban scenes.The overall structure of PVCC-Net is designed with a dual-branch U-Net encoding-decoding structure.First,the point and voxel branches extract features from different receptive fields.The point branch extracts point-level geometric semantic features through a local feature aggregation(LFA)module,which helps reduce the effects of feature redundancy and noise.The voxel branch stepwise expands the receptive field by using a convolutional network to extract voxel features at different levels.The voxel format is regular and ordered in the memory,which maintains the continuity of spatial information and compensates for the shortcomings of point clouds.The point fine-grained feature and voxel coarse-grained feature branches cover a range of spatial scopes with different resolutions,thus combining this multilevel contextual information to enhance feature extraction capabilities.The point-voxel consistency constraint(PV-CC)module adequately integrates fine-grained and coarse-grained features and enhances the adaptive ability between point clouds and voxels by constraining the distances between feature branches of different granularities in the same layer of the network,which enables the model to produce more stable prediction results.Subsequently,the point-voxel self-attention(PV-SA)mechanism sufficiently fuses point and voxel features while enhancing the expression of the global features.Finally,the performance of the network is further improved via weighted cross-entropy and Lovasz loss functions,which result in accurate and efficient point cloud classification in urban scenes.Results and Discussion The proposed PVCC-Net is trained and evaluated on three urban scene datasets,namely,Toronto3D,Semantic3D,and SensatUrban,with performances of 97.97%,93.80%,and 93.00%in terms of overall accuracy(OA)and 82.92%,75.70%,and 55.40%in terms of mean intersection of union(mIoU),respectively.All experimental results outperform the Baseline network(Table 2,Fig.6,and Fig.9).In addition,PVCC-Net achieves competitive experimental results compared with other state-of-the-art methods,which fully demonstrates its strong generalizability(Tables 3 and 4).Notably,PVCC-Net not only maintains the integrity of the internal structure of the categories but also makes the segmentation boundaries between different categories clear and accurate(Figs.4,7,and 10).Comparative experimental and ablation studies demonstrate that different granular features have different semantic representation capabilities.The combination of fine-grained point features and coarse-grained voxel features can significantly improve the accuracy of point cloud classification,and the consistency constraint reduces the differences between different granularity features by minimizing the feature distance,thereby improving the stability and robustness of the model(Table 5).However,the complexity analysis indicates a higher number of parameters and FLOPs in PVCC-Net,mainly because the convolution and deconvolution operations in the voxel branch incurred considerable computational costs.However,the Latency is close to that of the point-based and point-voxel fusion methods(Table 6).Conclusions In this study,PVCC-Net is used for the LiDAR point cloud classification of urban scenes.The network first aligns the distribution of point fine-grained features and voxel coarse-grained features through a point-voxel consistency constraint module and then uses a point-voxel self-attention mechanism to capture long-distance context information,enhancing the global feature representation,and finally alleviating the imbalance of point cloud categories in the urban scene via the square-root-weighted cross-entropy and Lovasz loss functions for accurate point cloud classification.On the Toronto3D,Semantic3D,and SensatUrban datasets,PVCC-Net improves the mIoU by 3.44 percentage points,0.90 percentage points,and 2.30 percentage points,respectively,compared with RandLA-Net.In addition,the classification performance of PVCC-Net is comparable to that of other advanced methods.The results of comparative experiments and ablation studies show that deeply fused point fine-grained features and voxel coarse-grained features can enhance the capability of the model to extract complex features in urban scenes and further constrain point and voxel features to maintain the consistency of the feature distributions and improve the stability of the model prediction results.However,PVCC-Net has a higher number of parameters and computational cost.Therefore,in future research,we will explore the synergistic and complementary effects of points and voxels in a lightweight scene point cloud classification task.
remote sensingpoint cloud classificationvoxelconsistency constraintself-attention mechanismurban scene