Partial Near-duplicate Video Detection Algorithm Based on Transformer Low-dimensional Compact Coding
To address the issues of existing partial near-duplicate video detection algorithms,such as high storage consumption,low query efficiency,and feature extraction module that does not consider subtle semantic differences between near-duplicate frames,this paper proposes a partial near-duplicate video detection algorithm based on Transformer.First,a Transformer-based feature encoder is proposed,which canlearn subtle semantic differences between a large number of near-duplicate frames.The fea-ture maps of frame regions are introduced with self-attention mechanism during frame feature encoding,effectively reducing the dimensionality of the feature while enhancing its representational capacity.The feature encoder is trained using a siamese net-work,which can effectively learn the semantic similarities between near-duplicate frames without negative samples.This elimi-nates the need for heavy and difficult negative sample annotation work,making the training process simpler and more efficient.Secondly,a key frame extraction method based on video self-similarity matrix is proposed.This method can extract rich,non-re-dundant key frames from the video,allowing for a more comprehensive description of the original video content and improved al-gorithm performance.Additionally,this approach significantly reduces the overhead associated with storing and computing redun-dant key frames.Finally,a graph network-based temporal alignment algorithm is used to detect and locate partial near-duplicate video clips based on the low-dimensional,compact encoded features of key frames.The proposed algorithm achieves impressive experimental results on the publicly available partial near-duplicate video detection dataset VCDB and outperforms existing algo-rithms.
Partial near-duplicate video detectionTransformerVideo self-similarity matrixKeyframe extraction