智能安全2024,Vol.3Issue(3) :54-65.DOI:10.12407/j.issn.2097-2075.2024.03.054

基于特征分离的复杂环境三维手部姿态估计算法研究

3D Hand Pose Estimation Based on Feature Disentanglement towards Complicated Environment

高鲲 张皓洋 李达 闫野 印二威
智能安全2024,Vol.3Issue(3) :54-65.DOI:10.12407/j.issn.2097-2075.2024.03.054

基于特征分离的复杂环境三维手部姿态估计算法研究

3D Hand Pose Estimation Based on Feature Disentanglement towards Complicated Environment

高鲲 1张皓洋 2李达 3闫野 2印二威2
扫码查看

作者信息

  • 1. 北京大学工学院,北京 100091;军事科学院国防科技创新研究院,北京 100071;智能博弈与决策实验室,北京 100071;天津(滨海)人工智能创新中心,天津 300450
  • 2. 军事科学院国防科技创新研究院,北京 100071;智能博弈与决策实验室,北京 100071;天津(滨海)人工智能创新中心,天津 300450
  • 3. 军事科学院国防科技创新研究院,北京 100071;智能博弈与决策实验室,北京 100071;天津(滨海)人工智能创新中心,天津 300450;南开大学软件学院,天津 300071
  • 折叠

摘要

基于视觉的三维手部姿态估计是实现人机交互的重要技术手段.目前,视觉手部姿态估计算法易受光照变化、遮挡和环境噪声等复杂环境因素干扰,导致模型的鲁棒性无法得到保障.这些多变的环境因素使得传统的深度学习方法在真实场景中难以取得令人满意的结果.针对这一难题,本文提出了一种基于特征分离的手部姿态估计算法,通过对手部图像中的关键特征进行精炼来提升模型在不同环境中的鲁棒性.首先,对编码器进行基于频域增强的预训练,从而减少环境噪声对于底层视觉特征提取的影响;其次,在解码阶段提出了一种用于分离因果特征和非因果特征的双分支结构,通过减少非因果特征对于姿态估计任务的影响以提高模型应对复杂环境的能力;最后,通过融合全局姿态信息和局部关节信息,实现了不同尺度的统一优化,并基于两个公开数据集的定量分析和定性分析,验证了本文所提出方法的准确性和鲁棒性.

Abstract

Visual 3D hand pose estimation is a crucial approach in the field of human-computer interaction.Currently,ap-proaches for visual hand pose estimation often pose challenges in ensuring the robustness of the model due to complicated environ-mental factors,such as illumination,occlusion,and environmental noises.These variable environmental factors make it difficult for traditional deep learning-based methods to achieve satisfactory results in real-world scenarios.To address this challenge,we propose a hand pose estimation approach based on feature disentanglement,which aims to enhance the model's robustness in diverse envi-ronments by refining key features in hand images.Specifically,this paper first conducts spectrum augmentation-based pretraining for the encoder,reducing the influence of environmental noises to low-level visual feature extraction.After that,a dual-branch structure is introduced during the decoder stage to decouple causal and non-causal features,decreasing the impact of non-causal features on the pose estimation task and improving the model's capability to handle complicated environments.Finally,the global posture information and local joint information are fused to achieve multi-scale refinement for the estimation.Qualitative and quan-titative results on two publicly datasets demonstrate the superior performance and robustness of the proposed method.

关键词

特征分离/复杂环境/三维手部姿态估计/因果-非因果特征解耦/全局-局部信息融合

Key words

feature separation/complicated environment/3D hand pose estimation/causal and non-causal feature disentan-glement/fusion of global and local information

引用本文复制引用

出版年

2024
智能安全
军事科学院国防科技创新研究院

智能安全

ISSN:2097-2075
段落导航相关论文