Optimization method of hand pose estimation based on unified view
Estimating the three-dimensional pose of hands accurately from depth images is an important task in the field of computer vision.However,due to self-occlusion of hands and joint self-similarity,hand pose estimation is extremely challen-ging.To overcome these difficulties,this paper investigated the impact of depth image sampling viewpoints on estimation accu-racy and proposed a UVP network.This network aimed to resample input depth images to a more easily estimable"front-facing"viewpoint and then enhance joint estimation accuracy through features from the original viewpoint.Firstly,it proposed a viewpoint transformation module to perform viewpoint rotation on input single-depth images,providing a supplementary se-cond viewpoint.Then,it introduced a viewpoint unification loss function to ensure that the transformed second viewpoint aligned with the"front-facing"viewpoint,minimizing self-occlusion issues.Finally,by employing network lightweight tech-niques such as changing convolutional combinations and reducing network depth,the method's performance was further opti-mized.Experimental results on three publicly available hand pose datasets(including ICVL,NYU,and MSRA)show average joint position errors of 4.92 mm,7.43 mm,and 7.02 mm,respectively.Moreover,the method achieves a processing speed of 159.39 frame/s on a computer equipped with an RTX3070 graphics card.Thus,it is evident that sampling depth images from different viewpoints and integrating features from dual viewpoints contribute to improved hand pose estimation accuracy.Additionally,the proposed method demonstrates adaptability and outstanding generalization capabilities,making it applicable to most single-depth image-based hand pose estimation models and providing robust support for the application of deep learning in three-dimensional hand pose estimation.