首页|Learning interlaced sparse Sinkhorn matching network for video super-resolution
Learning interlaced sparse Sinkhorn matching network for video super-resolution
扫码查看
点击上方二维码区域,可以放大扫码查看
原文链接
NSTL
Elsevier
How to effectively fuse inter-and intra-frame spatio-temporal information plays a key role in video super-resolution (VSR). Most existing works rely heavily on the accuracy of motion estimation and compen-sation for spatio-temporal feature alignment. However, they cannot perform well when suffering from large-scale and complex motions. To this end, this paper introduces an efficient and effective Interlaced Sparse Sinkhorn Matching (ISSM) network for VSR, which aligns supporting frames with the reference one in the feature space by learning optimal matching between image regions across frames. Specifi-cally, the ISSM divides the input dense affinity matrix into two sparse block matrixes: one can match long-distance regions while the other can match short-distance regions, and then we leverage an effi-cient Sinkhorn method on each block to learn optimal matching. Moreover, we insert a residual atrous spatial pyramid pooling module before the ISSM, which can flexibly generate multi-scale features frame by frame to capture the multi-scale context information in images. The aligned features of each adjacent frame are then fed to a bidirectional temporal fusion module to capture the rich temporal information. Finally, the fused features are sent into a frame-wise dynamic reconstruction network to produce an HR frame. Extensive evaluations on three benchmark datasets demonstrate the superiority of our method over the state-of-the-art methods in terms of PSNR and SSIM. (c) 2021 Elsevier Ltd. All rights reserved.
Video super-resolutionMulti-scale featureInterlaced sparse sinkhorn attentionBidirectional fusionDynamic reconstruction