首页|Learning multiscale hierarchical attention for video summarization
Learning multiscale hierarchical attention for video summarization
扫码查看
点击上方二维码区域,可以放大扫码查看
原文链接
NSTL
Elsevier
In this paper, we propose a multiscale hierarchical attention approach for supervised video summarization. Different from most existing supervised methods which employ bidirectional long short-term memory networks, our method exploits the underlying hierarchical structure of video sequences and learns both the short-range and long-range temporal representations via a intra-block and a inter-block attention. Specifically, we first separate each video sequence into blocks of equal length and employ the intrablock and inter-block attention to learn local and global information, respectively. Then, we integrate the frame-level, block-level, and video-level representations for the frame-level importance score prediction. Next, we conduct shot segmentation and compute shot-level importance scores. Finally, we perform key shot selection to produce video summaries. Moreover, we extend our method into a two-stream framework, where appearance and motion information is leveraged. Experimental results on the SumMe and TVSum datasets validate the effectiveness of our method against state-of-the-art methods. (c) 2021 Elsevier Ltd. All rights reserved.
Video summarizationHierarchical structureAttention modelsMultiscale temporal representationTwo-stream framework