3D Human Pose and Shape Estimation Based on Live Video Streams
A 3D human body pose and shape estimation method based on a temporal attention mechanism is proposed to meet the requirements of real-time accuracy and realism in 3D human body pose and shape estimation for applications such as the metaverse,gaming,and virtual reality.First,image features are extracted and input into a motion continuity attention module to better calibrate the time sequence range that requires attention.Then,a real-time feature attention integration module is used to effectively combine the feature representations of the current frame and past frames.Finally,the human parameter regression network is used to obtain the final results,and a graph convolutional generative adversarial network is used to determine whether the model comes from real human motion data.Compared with previous methods based on real-time video streams,the proposed method reduces the acceleration error by an average of 30%on mainstream datasets,while reducing the network parameters and computational complexity by 65%.The proposed method achieves a 3D human body pose and shape estimation speed of 55~60 frames per second in practical tests,providing better user experience and higher application value for applications such as the metaverse,gaming,and virtual reality.
3 D human reconstructionskinned multi-person linear modelreal-time feature attention integrationgraph convolutional neural networkmachine learning