Learning multiscale hierarchical attention for video summarization

扫码查看

原文链接

NSTL
Elsevier

外文摘要：In this paper, we propose a multiscale hierarchical attention approach for supervised video summarization. Different from most existing supervised methods which employ bidirectional long short-term memory networks, our method exploits the underlying hierarchical structure of video sequences and learns both the short-range and long-range temporal representations via a intra-block and a inter-block attention. Specifically, we first separate each video sequence into blocks of equal length and employ the intrablock and inter-block attention to learn local and global information, respectively. Then, we integrate the frame-level, block-level, and video-level representations for the frame-level importance score prediction. Next, we conduct shot segmentation and compute shot-level importance scores. Finally, we perform key shot selection to produce video summaries. Moreover, we extend our method into a two-stream framework, where appearance and motion information is leveraged. Experimental results on the SumMe and TVSum datasets validate the effectiveness of our method against state-of-the-art methods. (c) 2021 Elsevier Ltd. All rights reserved.

外文关键词：

Video summarizationHierarchical structureAttention modelsMultiscale temporal representationTwo-stream framework

作者：

Zhu, Wencheng、Lu, Jiwen、Han, Yucheng、Zhou, Jie

展开 >

作者单位：

Tsinghua Univ

出版年：

2022

DOI：

10.1016/j.patcog.2021.108312

Pattern Recognition

EISCI

ISSN：0031-3203

年,卷(期)：2022.122

被引量14
参考文献量45