图学学报2024,Vol.45Issue(1) :159-168.DOI:10.11996/JG.j.2095-302X.2024010159

一种基于时空运动信息交互建模的三维人体姿态估计方法

A 3D human pose estimation approach based on spatio-temporal motion interaction modeling

吕衡 杨鸿宇
图学学报2024,Vol.45Issue(1) :159-168.DOI:10.11996/JG.j.2095-302X.2024010159

一种基于时空运动信息交互建模的三维人体姿态估计方法

A 3D human pose estimation approach based on spatio-temporal motion interaction modeling

吕衡 1杨鸿宇2
扫码查看

作者信息

  • 1. 北京航空航天大学计算机学院,北京 100191
  • 2. 北京航空航天大学人工智能研究院,北京 100191
  • 折叠

摘要

三维人体姿态估计在虚拟现实和人机交互等领域具有重要作用.近年来,Transformer已被引入三维人体姿态估计领域,用于捕捉人体关节点的时空运动信息.然而,现有研究通常只关注于人体关节点群的整体运动,或只对单独的人体关节点运动进行建模,均没有深入地探讨每个关节点的独特运动模式及不同关节点运动间的相互影响.因此,提出了一种创新的方法,旨在细致地学习每帧中的二维人体关节点的空间信息,并对每个关节点的特定运动模式进行深入分析.通过设计一个基于 Transformer 编码器的运动信息交互模块,精确地捕捉不同关节点之间的动态运动关系.相较于已有直接对人体关节点的整体运动进行学习的模型,此方法能够使得预测精度提高约 3%.与注重单节点运动的最先进 MixSTE模型相比,该模型在捕捉关节点的时空特征方面更为高效,推理速度实现了 20%以上提升,使其更适合于实时推理的场景.

Abstract

3D human pose estimation plays a crucial role in fields such as virtual reality and human-computer interaction.In recent years,the Transformer has been introduced into the domain of 3D human pose estimation to capture the spatiotemporal motion information of human joints.However,existing studies typically focus on the collective movement of joint clusters or exclusively model the movement of individual joints,without delving into the unique movement patterns of each joint and their interdependencies.Consequently,an innovative approach was proposed,which meticulously learnt the spatial information of 2D human joints in each frame and conducted an in-depth analysis of the specific movement patterns of each joint.Through the design of a motion information interaction module based on the Transformer encoder,the proposed method accurately captured the dynamic relationships between different joints.In comparison to existing models that directly learnt the overall motion of human joints,the proposed method enhanced prediction accuracy by approximately 3%.When benchmarked against the state-of-the-art MixSTE model,which primarily focused on individual joint movement,the proposed model demonstrated greater efficiency in capturing spatiotemporal features of joints,achieving an inference speed boost of over 20%,making it especially suitable for real-time inference scenarios.

关键词

3D人体姿态估计/Transformer编码器/关节点间运动/时空信息关联/实时推理

Key words

3D human pose estimation/Transformer encoder/inter-joint motion/temporal-spatial information correlation/real-time inference

引用本文复制引用

基金项目

北京市自然科学基金项目(4222049)

国家自然科学基金项目(62202031)

出版年

2024
图学学报
中国图学学会

图学学报

CSTPCD北大核心
影响因子:0.73
ISSN:2095-302X
参考文献量31
段落导航相关论文