基于时空解耦Transformer的视频字幕去除算法

Video Decaptioning Based on Decoupled Spatial-Temporal Transformer

涂奕飞 ¹蔡非凡 ¹王超 ¹丁友东¹

扫码查看

作者信息

1. 上海大学上海电影学院,上海 200072;上海电影特效工程技术研究中心,上海 200072
折叠

摘要

视频字幕在传递信息的同时,固化在视频中的字幕也阻碍了视频的重复利用.提出一种基于时空解耦Trans-former 的视频字幕去除算法,能够从带有字幕文本的视频序列中去除字幕文本,并重建出被字幕区域遮挡的背景图像.整体框架分为两个部分,字幕掩膜提取模块和字幕去除模块,前者快速精准地获得输入视频序列的二值字幕掩膜,将得到的二值字幕掩膜作为辅助信息,输入到基于时空解耦Transformer的字幕去除模块,进行字幕文本的去除和背景纹理的恢复,实现对整体视频字幕的去除.与现有的经典视频字幕去除方法相比,在峰值信噪比和结构相异性等图像质量指标以及视觉效果上,该方法均取得了更好的性能,实验结果验证了该方法在视频字幕去除领域的有效性.

Abstract

While video captions deliver information,the captions solidified in the video also hinder the reuse of the video.This paper proposes a decoupled spatial-temporal Transformer based video subtitle removal model that can remove subtitle text from video sequences with subtitle text and recover the background obscured by subtitle regions.The overall frame-work is divided into two parts,the subtitle mask extraction module,and the subtitle removal module.The former obtains the binary subtitle mask of the input video sequence quickly and accurately and feeds the obtained binary subtitle mask as auxiliary information to the decoupled spatial-temporal Transformer based subtitle removal module for subtitle text removal and background texture recovery to achieve the removal of video captions.

关键词

视频去字幕/深度学习/Transformer/注意力机制

Key words

video decaptioning/deep learning/Transformer/attention mechanism

引用本文复制引用

基金项目

国家自然科学基金(61303093)

国家自然科学基金(61402278)

上海市自然科学基金(19ZR1419100)

出版年

2024

工业控制计算机

中国计算机学会工业控制计算机专业委员会江苏省计算技术研究所有限责任公司

工业控制计算机

影响因子：0.258

ISSN：1001-182X

参考文献量11

段落导航