Small object grasping detection in cluttered scenes
Objective Object grasp pose detection in cluttered scenes is an essential skill for intelligent robots.Despite recent advances in six degrees-of-freedom grasping learning,learning the grasping configuration of small objects is extremely challenging.First,given the huge amount of raw point cloud data,points in the scene should be downsampled to reduce the computational complexity of the network and increase detection efficiency.Meanwhile,previous sampling meth-ods sample fewer points on small objects,leading to difficulties in learning small object grasping poses.In addition,consumer-grade depth cameras currently available in the market are seriously noisy,particularly because the quality of point clouds obtained on small objects cannot be guaranteed,leading to the possibility of unclear objecthood of points on small objects predicted by the network.Some feasible grasping points are mistakenly regarded as background points,fur-ther reducing the number of sampling points on small objects,resulting in weak grasping performance on small objects.Method A potential problem in previous grasp detection methods is that they do not consider the biased distribution of sam-pling points due to differences in the scale of objects in the scene,resulting in fewer sampling points on small objects.In this study,we propose an object mask-assisted sampling method that samples the same points on all objects to balance grasping distribution,solving the problem of the uneven distribution of sampling points.In the inference,without a priori knowledge of scene point-level masks,we introduce an unseen object instance segmentation network to distinguish objects in the scenario,implementing a mask-assisted sampling method.In addition,a multi-scale learning strategy is used for learning,and multi-scale cylindrical grouping is used on the partial point clouds of objects to improve local geometric repre-sentation,solving the problem of difficulty in learning to grasp operational parameters caused by differences in object scales.In particular,we set up three cylinders with different radii to sample the point cloud near the graspable point,corre-sponding to learning large,medium,and small object features,and then splice the features of the three scales.Subse-quently,we process the spliced features with a self-attention layer to enhance the attention of the local region and improve the local geometric representation of the object.Similar to GraspNet,we design an end-to-end grasping network that con-sists of three parts:graspable points,approach direction,and prediction of gripper operation.Graspable points represent the high-scoring points in the scene that are suitable for grasping.They can perform the initial filtering of a large amount of point cloud data in the scene and then embedded into the proposed sampling and learning methods to further predict the approach direction and gripper operation for grasping poses on an object.By designing an end-to-end grasping network embedded with the proposed sampling and learning approach,we can effectively improve object grasping detection capabil-ity.Result Finally,the proposed method achieves state-of-the-art performance when evaluated on the large benchmark dataset GraspNet-1 Billion,wherein the grasping metrics on small objects are improved by 7%on average,and a large num-ber of real robot experiments also show that the approach exhibits promising generalization performance on unseen objects.To more intuitively observe the improvement of the grasping performance of the proposed method on small objects,we also use the previous most representative method,i.e.,graspness-based sampling network(GSNet),as the benchmark method and visualize the grasping detection results of the benchmark method and the proposed method in this study under four clut-tered scenarios.The visualization results show that the previous method tends to predict grasping on large objects in the scene but does not show reasonable grasping poses on some small objects.By contrast,the proposed method can accurately predict grasping poses on small objects.Conclusion Focusing on grasping small objects,this study proposes a mask-assisted sampling method embedded into the proposed end-to-end learning network and introduces a multi-scale grouping learning strategy to improve the local geometric representation of objects,effectively improving the quality of grasping small objects and outperforming previous methods in the evaluation of grasping all objects.However,the proposed method has certain limitations.For example,when using noisy and low-quality depth maps as input,existing unseen object instance segmentation methods may produce incorrect object masks,failing in mask-assisted sampling.In the future,we plan to investigate more robust unseen object instance segmentation methods that can correct erroneous segmentation results under low-quality depth map input.This procedure will allow us to obtain more accurate object instance masks and enhance object grasping detection capability in cluttered scenes.
six degrees-of-freedom graspingsampling strategymultiscale learningpoint cloud learningdeep learning