首页|基于Transformer紧凑编码的局部近重复视频检测算法

基于Transformer紧凑编码的局部近重复视频检测算法

扫码查看
针对现有局部近重复视频检测算法特征存储消耗大、整体查询效率低、提取特征时并未考虑近重复帧之间细微的语义差异等问题,文中提出了一种基于Transformer紧凑编码的局部近重复视频检测算法.首先,提出了一个基于Transformer的特征编码器,其学习了大量近重复帧之间细微的语义差异,可以在编码帧特征时对各个区域特征图引入自注意力机制,在有效降低帧特征维度的同时也提高了编码后特征的表示性.该特征编码器通过孪生网络训练得到,该网络不需要负样本就可以有效学习近重复帧之间的相似语义信息,因此无需沉重和困难的难负样本标注工作,使得训练过程更加简易和高效.其次,提出了一个基于视频自相似度矩阵的关键帧提取方法,可以从视频中提取丰富但不冗余的关键帧,从而使关键帧特征序列能够更全面地描述原视频内容,提升算法的性能,同时也大幅减少了存储和计算冗余关键帧带来的开销.最后,基于关键帧的低维紧凑编码特征,采用基于图网络的时间对齐算法,实现局部近重复视频片段的检测和定位.该算法在公开的局部近重复视频检测数据集VCDB上取得了优于现有算法的实验性能.
Partial Near-duplicate Video Detection Algorithm Based on Transformer Low-dimensional Compact Coding
To address the issues of existing partial near-duplicate video detection algorithms,such as high storage consumption,low query efficiency,and feature extraction module that does not consider subtle semantic differences between near-duplicate frames,this paper proposes a partial near-duplicate video detection algorithm based on Transformer.First,a Transformer-based feature encoder is proposed,which canlearn subtle semantic differences between a large number of near-duplicate frames.The fea-ture maps of frame regions are introduced with self-attention mechanism during frame feature encoding,effectively reducing the dimensionality of the feature while enhancing its representational capacity.The feature encoder is trained using a siamese net-work,which can effectively learn the semantic similarities between near-duplicate frames without negative samples.This elimi-nates the need for heavy and difficult negative sample annotation work,making the training process simpler and more efficient.Secondly,a key frame extraction method based on video self-similarity matrix is proposed.This method can extract rich,non-re-dundant key frames from the video,allowing for a more comprehensive description of the original video content and improved al-gorithm performance.Additionally,this approach significantly reduces the overhead associated with storing and computing redun-dant key frames.Finally,a graph network-based temporal alignment algorithm is used to detect and locate partial near-duplicate video clips based on the low-dimensional,compact encoded features of key frames.The proposed algorithm achieves impressive experimental results on the publicly available partial near-duplicate video detection dataset VCDB and outperforms existing algo-rithms.

Partial near-duplicate video detectionTransformerVideo self-similarity matrixKeyframe extraction

王萍、余圳煌、鲁磊

展开 >

西安交通大学信息与通信工程学院 西安 710049

局部近重复视频检测 Transformer 视频自相似度矩阵 关键帧提取

2024

计算机科学
重庆西南信息有限公司(原科技部西南信息中心)

计算机科学

CSTPCD北大核心
影响因子:0.944
ISSN:1002-137X
年,卷(期):2024.51(5)
  • 23