Aiming at the problem of 3D human reconstruction in computer vision,an end-to-end net-work framework is proposed to reconstruct accurate 3D human mesh with texture from a single color image under the hybrid supervision of 3D and 2D.In this paper,four encoders are used to extract shape and pose features,texture features,illumination parameters and camera parameters respective-ly.And the features obtained are sent to the 3D regression module to iteratively infer the parameters of the 3D human model.Texture parameters are fed into the texture decoder network to obtain texture maps.The learned human model parameters can be transformed into the 3D human mesh.For the set-ting of loss function,the difference between the predicted human mesh vertex and the ground truth is used for 3D supervision.The 2D rendering loss is calculated by predicted camera parameters,illumi-nation parameters and mapped texture.The 2D joint reprojection loss is calculated by projecting the 3D joints to the 2D joints and then comparing with the ground truth.The discriminator of generative adversarial network(GAN)is used to make the rendered images more realistic.The qualitative and quantitative experimental results show that the proposed method achieves comparable performance with some state-of-the-art 3D human reconstruction methods.Moreover,the reconstructed 3D human mesh possesses the corresponding texture map according to the input human image.
3D human reconstructiondeep learningskinned multi-person linear(SM PL)modelshape and posetexture