Research on monocular pose estimation algorithm based on improved YOLO6D
Aiming at the problems of low precision,weak real-time performance and complex models of 6D pose estimation algorithm of low texture objects in cluttered scenes based on monocular RGB images,a pose estimation algorithm based on improved YOLO6D is proposed.The backbone networks of DarkNet-19 algorithm is replaced by ConvNeXt,a pure convolutional neural network(CNN).The output of the network is up-sampled after spatial pyramid pooling(SPP)processing.Then,the results are jointed with the low-level feature maps to achieve feature fusion to improve,the feature extraction capability and multi-scale capability of the network.Loss function is improved based on Focal Loss to improve learning ability of network.According to the prior size information and geometric features of the object,more 2D-3D point pairs are derived to improve the calculation precision of the perspective projection transformation PnP(perspective-n-point)algorithm.Experiments are carried out on the LINEMOD dataset,and the experimental results show that,taking the 5-pixel threshold of 2D reprojection as the index,the average precision of the proposed algorithm on 12 experimental objects reaches 95.60%,which is 8.14 percentage points higher than that of the original algorithm.The time consuming is about 60 ms and the performance is significantly improved.