Aiming at the problem of low accuracy of occlusion target grasping position detection when the robot relies on vision grasping,we proposed an occlusion target grasping position detection method based on spatial information aggregation.Occlusion led to the change of the target's intrinsic features in the cam-era's field of view,which affected the target's positional information and shape-structural features.First,coordinate convolution was used instead of traditional convolution for feature extraction,and a new coordi-nate channel was added after the input feature map to improve the network's ability to perceive position in-formation.Second,the spatial information aggregation module was designed,which adopted a parallel structure to increase the local sensing field and encoded the channels along the spatial direction to obtain multi-scale spatial information,and then aggregated the information through nonlinear fitting to make the model better understand the target structure and shape.Finally,the grasping position detection network outputted the grasping mass,angle and width,and calculated the optimal grasping position to establish the optimal grasping rectangular box.Validated on the Cornell Grasping dataset,the self-constructed occlu-sion dataset,and the Jacquard dataset,the detection accuracies reach 98.9%,94.7%,and 96.0%,re-spectively,and the success rate is 93%in 100 real grasping experiments on the target in the experimental platform.The proposed method achieves the highest detection accuracy on all three datasets,and the de-tection effect is better in real scenes.
关键词
抓取位姿检测/遮挡目标/空间信息聚合/坐标卷积
Key words
grasping detection/occluded target/spatial information aggregation module/CoordConv