A 3D human pose estimation approach based on spatio-temporal motion interaction modeling
3D human pose estimation plays a crucial role in fields such as virtual reality and human-computer interaction.In recent years,the Transformer has been introduced into the domain of 3D human pose estimation to capture the spatiotemporal motion information of human joints.However,existing studies typically focus on the collective movement of joint clusters or exclusively model the movement of individual joints,without delving into the unique movement patterns of each joint and their interdependencies.Consequently,an innovative approach was proposed,which meticulously learnt the spatial information of 2D human joints in each frame and conducted an in-depth analysis of the specific movement patterns of each joint.Through the design of a motion information interaction module based on the Transformer encoder,the proposed method accurately captured the dynamic relationships between different joints.In comparison to existing models that directly learnt the overall motion of human joints,the proposed method enhanced prediction accuracy by approximately 3%.When benchmarked against the state-of-the-art MixSTE model,which primarily focused on individual joint movement,the proposed model demonstrated greater efficiency in capturing spatiotemporal features of joints,achieving an inference speed boost of over 20%,making it especially suitable for real-time inference scenarios.
3D human pose estimationTransformer encoderinter-joint motiontemporal-spatial information correlationreal-time inference