3D Hand Pose Estimation Based on Feature Disentanglement towards Complicated Environment
Visual 3D hand pose estimation is a crucial approach in the field of human-computer interaction.Currently,ap-proaches for visual hand pose estimation often pose challenges in ensuring the robustness of the model due to complicated environ-mental factors,such as illumination,occlusion,and environmental noises.These variable environmental factors make it difficult for traditional deep learning-based methods to achieve satisfactory results in real-world scenarios.To address this challenge,we propose a hand pose estimation approach based on feature disentanglement,which aims to enhance the model's robustness in diverse envi-ronments by refining key features in hand images.Specifically,this paper first conducts spectrum augmentation-based pretraining for the encoder,reducing the influence of environmental noises to low-level visual feature extraction.After that,a dual-branch structure is introduced during the decoder stage to decouple causal and non-causal features,decreasing the impact of non-causal features on the pose estimation task and improving the model's capability to handle complicated environments.Finally,the global posture information and local joint information are fused to achieve multi-scale refinement for the estimation.Qualitative and quan-titative results on two publicly datasets demonstrate the superior performance and robustness of the proposed method.
feature separationcomplicated environment3D hand pose estimationcausal and non-causal feature disentan-glementfusion of global and local information