首页|Learning multiscale hierarchical attention for video summarization

Learning multiscale hierarchical attention for video summarization

扫码查看
In this paper, we propose a multiscale hierarchical attention approach for supervised video summarization. Different from most existing supervised methods which employ bidirectional long short-term memory networks, our method exploits the underlying hierarchical structure of video sequences and learns both the short-range and long-range temporal representations via a intra-block and a inter-block attention. Specifically, we first separate each video sequence into blocks of equal length and employ the intrablock and inter-block attention to learn local and global information, respectively. Then, we integrate the frame-level, block-level, and video-level representations for the frame-level importance score prediction. Next, we conduct shot segmentation and compute shot-level importance scores. Finally, we perform key shot selection to produce video summaries. Moreover, we extend our method into a two-stream framework, where appearance and motion information is leveraged. Experimental results on the SumMe and TVSum datasets validate the effectiveness of our method against state-of-the-art methods. (c) 2021 Elsevier Ltd. All rights reserved.

Video summarizationHierarchical structureAttention modelsMultiscale temporal representationTwo-stream framework

Zhu, Wencheng、Lu, Jiwen、Han, Yucheng、Zhou, Jie

展开 >

Tsinghua Univ

2022

Pattern Recognition

Pattern Recognition

EISCI
ISSN:0031-3203
年,卷(期):2022.122
  • 14
  • 45