Fast Two-Stage 3D Object Detection with Semantic Guidance
With the continuous increase in the sampling rate of LiDAR,systems can rapidly acquire high-resolution point cloud data of scenes.Dense point clouds are advantageous for improving the accuracy of 3D object detection;however,they increase the computational load.In addition,point-based 3D object detection methods encounter the challenges of balancing speed and accuracy.To enhance the computational efficiency of multilevel downsampling in 3D object detection and address issues such as low foreground point recall rate and size ambiguity of the one-stage network,a fast two-stage method based on semantic guidance is proposed herein.In the first stage,a semantic-guided downsampling method is introduced to enable deep neural networks to efficiently perceive foreground points.In the second stage,a channel-aware pooling method is employed to aggregate semantic information of the sampled points by adding pooled points,thereby enrich the feature description of regions of interest,and obtain more accurate proposal boxes.Test results on the KITTI dataset reveal that compared with similar two-stage baseline methods,the proposed method achieves the highest detection-accuracy improvements of 4.62 percentage points,1.44 percentage points,and 3.91 percentage points for cars,pedestrians,and cyclists,respectively.Furthermore,the inference speed reaches 55.6 frame/s,surpassing the fastest benchmark by 31.1%.The algorithm exhibits strong performance in accuracy and speed,holding practical value for real-world applications.
point cloudsemantic-guided downsamplingchannel-aware pooling3D object detection