Transformer-Based Pedestrian Video Inpainting Guided by Pseudo-Spatiotemporal Pose Correction Graph Convolutional Networks
In order to solve the problem of repairing occluded pedestrians in surveillance videos,a pedestrian video inpainting method based on human pose is proposed,which repairs the incomplete pedestrian pose sequence at first,and then inpaints the video frames under the guidance of the repaired pose sequence.Firstly,the proposed method uses OpenPose to extract the occluded human pose sequence from the video.Due to occlusions,some joints of the extracted poses may be unrecognized or inaccurately recognized.We thus propose a pseudo-spatiotemporal graph convolutional network to repair the extracted poses and obtain an accurate pose sequence.We then propose a Transformer-based pedestrian video repair model guided by the repaired pose sequence.Tested on the Human3.6M dataset,the proposed method is better than previous approaches in terms of four metrics including PSNR,RMSE,SSIM,and LPIPS.Especially,RMSE is im-proved by 9.50%,and LPIPS is improved by 21.67%.
deep learninggraph convolutional networkTransformerhuman pose completionvideo inpainting