Clothed feature learning for single-view 3D human reconstruction
Objective Clothed human reconstruction is an important problem in the field of computer vision and computer graphics.This process aims to generate three-dimensional(3D)human body models,including clothes and accessories,through computer technology,and is widely used in virtual reality,digital human body,3D clothing assistant design,film and television special effects production,and other scenes.Compared with a large number of single-view images available on the internet,multiview images are more difficult to obtain.Considering that single-view images are easier to obtain on the internet,which can greatly reduce the use conditions and hardware cost of reconstruction,we consider single-view images as input to establish a complete mapping between single-view human image and human shape and restore the 3D shape and geometric details of the human body.Most methods based on parametric models can only predict the shape and posture of the human body with a smooth surface,whereas nonparametric model methods lack a fixed grid topology when generating fine geometric shapes.High-precision 3D human model extraction can be realized through the combination of parametric human model and implicit function.Given that clothing can produce dynamic flexible deformation with the change in human posture,most methods focus on obtaining the fold details of a clothed human model from 3D mesh defor-mation.The clothing can be separated from the human body with the assistance of a clothed template,and flexible deforma-tion of the clothing caused by human body posture can be directly obtained via the learning-based method.Given the over-lapping of limbs,occlusion,and complex clothed posture of clothed human body in single-view 3D human body reconstruc-tion,obtaining geometric shape representation under various clothed postures and angles is difficult.Moreover,existing methods can only accurately extract and represent visual features from clothed human body images without considering the dynamic detail expression caused by complex clothed posture.Difficulties are encountered regarding the representation and learning of the clothed features related to the posture of single-view clothed human and generation of a clothed mesh with complex posture and dynamic folds.In this paper,we propose a single-perspective 3D human reconstruction clothed fea-ture learning method.Method We propose a feature learning approach to reconstruct clothed human with a single-view image.The experimental hardware platform in this paper used two NVIDIA GeForce RTX 1080Ti GPU.We utilized the clothing co-parsing fashion street photography dataset,which includes 2 098 human images,to analyze the physical fea-tures of clothing.Human3.6M dataset was used to learn posture features of the human body,with the test set at 3DPW cap-ture from the field environment.For fold feature learning,we used objects 00096 and 00159 in the CAPE dataset.For bet-ter training effects of the clothed mesh,we selected 150 meshes close to the dressed posture from the THuman2.0 dataset as the training set for use in shape feature learning.First,we represented the limb features of the single-view image and extracted the clothed human pose features through 2D node prediction and deep regression of pose features.Then,based on the pose features of the clothing,the sampling space of clothing fold centered on the flexible deformation joint and flex-ible deformation loss function were defined.In addition,flexible clothing deformation was learned through the introduction of a clothing template to the input ground truth model of the clothing body to obtain fold features.We only focused on cru-cial details inside the space to acquire the fold features.Afterward,the human shape features learning module was con-structed via the combination of posture parameter regression,feature map sampling feature,and codec.The pixel and voxel alignment features were learned from the corresponding image and grid in the 3D human mesh dataset,and the shape features of the human body were decoded.Finally,through the combination of fold features,shape features of a clothed human,and calculated 3D sampling space,the 3D human mesh was reconstructed by defining the signed distance field,and the final clothed human model was outputted.Result Aiming at the results of posture feature and single-view 3D clothed human reconstruction,we used 3DPW and THuman2.0 datasets.Our experimental results findings compared with those of the three methods on the 3DPW dataset.The mean per joint position error(MPJPE)and MPJPE of Platts align-ment(PA-MPJPE)were used to evaluate the differences between the predicted 3D joints and ground truth.The mean per vertex position error(MPVPE)was used to evaluate the predicted SMPL 3D human shape and the ground truth grid.Com-pared with that of the second-best model,the error was decreased by 1.28 on MPJPE,1.66 on PA-MPJPE and 2.38 on MPVPE,and the average error was reduced by 2.4%.For the 3D reconstruction of the clothed human body,we conducted experiments to compare the four methods on the THuman2.0 dataset.We used the Chamfer distance(CD/Chamfer)and point-to-surface distance(P2S)of the 3D space to evaluate the gap between the two 3D mesh groups.Notably,the P2S of the reconstructed result can be reduced by 4.4%compared with the second-best model,and CD can be reduced by 2.6%.Experimental results reveal that the posture feature learning module contributed to the reconstruction of the complete limb and correct posture,and fold feature learning for optimized learned shape features can be used to obtain high-precision reconstruction results.Conclusion In this paper,the clothed feature learning method for single-view 3D human-body recon-struction enables the effective learning of the clothed feature of single-view 3D human reconstruction and generates clothed human reconstruction results with complex posture and dynamic folds.
single-view 3D human reconstructionclothed feature learningsampling spaceflexible deformationsigned distance field