A Dynamic Virtual Try-on Method for Clothing Twisting Network Based on STN
Dynamic virtual try-on aims to generate coherent,smooth,and realistic fitting videos.Current methods often encounter issues such as clothing self-occlusion and blurry patterns due to changes in body posture.Therefore,this paper proposes a clothing distortion constraint and prediction method based on the Spatial Transformer Network(STN).In the clothing distortion network,the Transformer module takes advantage of both global information and local key information to strengthen the data feature region,and the STN module uses the learnable Thin Plate Spline interpolation(TPS)method to predict the clothing distortion range and obtain the distorted image and mask.The try-on network is conducive to the U-Net network of self-attention mechanisms to align the distorted image mask and human body representation information,and generate high-quality try-on images.Finally,a dynamic synthesis network was used to solve the temporal consistency problem of video frames,and a coherent high-quality fitting video was generated.On the VVT dataset,compared to CP-VTON,our proposed method achieved an improvement of 0.076 in the average Structural Similarity Index(SSIM)and a decrease of 0.420 in the average perceptual image patch similarity(LPIPS).Compared to the FW-GAN method,it reduced by 0.089 in the I3D metric and 2.252 in the ResNeXt101 metric.On the VITON-HD dataset,the SSIM index of the proposed method exceeds that of CP-VTON and FW-GAN,further indicating that the images generated by the proposed method exhibit high quality and low distortion.