一种基于时空运动信息交互建模的三维人体姿态估计方法

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：三维人体姿态估计在虚拟现实和人机交互等领域具有重要作用.近年来,Transformer已被引入三维人体姿态估计领域,用于捕捉人体关节点的时空运动信息.然而,现有研究通常只关注于人体关节点群的整体运动,或只对单独的人体关节点运动进行建模,均没有深入地探讨每个关节点的独特运动模式及不同关节点运动间的相互影响.因此,提出了一种创新的方法,旨在细致地学习每帧中的二维人体关节点的空间信息,并对每个关节点的特定运动模式进行深入分析.通过设计一个基于 Transformer 编码器的运动信息交互模块,精确地捕捉不同关节点之间的动态运动关系.相较于已有直接对人体关节点的整体运动进行学习的模型,此方法能够使得预测精度提高约 3％.与注重单节点运动的最先进 MixSTE模型相比,该模型在捕捉关节点的时空特征方面更为高效,推理速度实现了 20％以上提升,使其更适合于实时推理的场景.

外文标题：A 3D human pose estimation approach based on spatio-temporal motion interaction modeling

外文摘要：3D human pose estimation plays a crucial role in fields such as virtual reality and human-computer interaction.In recent years,the Transformer has been introduced into the domain of 3D human pose estimation to capture the spatiotemporal motion information of human joints.However,existing studies typically focus on the collective movement of joint clusters or exclusively model the movement of individual joints,without delving into the unique movement patterns of each joint and their interdependencies.Consequently,an innovative approach was proposed,which meticulously learnt the spatial information of 2D human joints in each frame and conducted an in-depth analysis of the specific movement patterns of each joint.Through the design of a motion information interaction module based on the Transformer encoder,the proposed method accurately captured the dynamic relationships between different joints.In comparison to existing models that directly learnt the overall motion of human joints,the proposed method enhanced prediction accuracy by approximately 3％.When benchmarked against the state-of-the-art MixSTE model,which primarily focused on individual joint movement,the proposed model demonstrated greater efficiency in capturing spatiotemporal features of joints,achieving an inference speed boost of over 20％,making it especially suitable for real-time inference scenarios.

外文关键词：

3D human pose estimationTransformer encoderinter-joint motiontemporal-spatial information correlationreal-time inference

作者：

吕衡、杨鸿宇

展开 >

作者单位：

北京航空航天大学计算机学院,北京 100191

北京航空航天大学人工智能研究院,北京 100191

关键词：

3D人体姿态估计 Transformer编码器关节点间运动时空信息关联实时推理

基金：

北京市自然科学基金项目国家自然科学基金项目

项目编号：

422204962202031

出版年：

2024

DOI：

10.11996/JG.j.2095-302X.2024010159

图学学报

中国图学学会

图学学报

CSTPCD北大核心

影响因子：0.73

ISSN：2095-302X

年,卷(期)：2024.45(1)

参考文献量31