基于动态Transformer的监控视频摘要系统设计

Design of Surveillance Video Summarization System Based on Dynamic Transformer

阮志坚 ¹彭力¹

扫码查看

作者信息

1. 江南大学物联网工程学院,江苏无锡 214000
折叠

摘要

监控视频摘要系统是一种重要的技术手段,用于从庞大而复杂的监控视频中提取关键信息,为安全管理和事件分析提供有效支持;随着监控设备的普及和监控视频数据的快速增长,传统的手动摘要方法已经无法满足快速处理和准确提取所需信息的需求,现代的深度学习方法普遍存在计算复杂度高、参数多的问题;针对这一问题,提出了一种基于动态Transformer的监控视频摘要模型;自动为每个输入视频帧配置适当数量的token,通过级联多个Transformer模型,并逐渐增加生成的token数量,以实现自适应的激活顺序;一旦产生足够置信的预测,推理过程就会终止,并采用了特征重用和注意力重用技术以减少冗余计算;该模型在降低计算复杂度方面取得了显著进展,经实验测试,相较于传统模型,该动态Transformer模型在准确率上有所提升,在这两个公开数据集上F分数指标分别提高了 3.7％和0.9％,同时计算复杂度降低了 40％,可以满足精度要求和监控要求,证明模型具有良好的泛化性.

Abstract

A surveillance video summarization system is an important technical tool,it is used to extract key information from large and complex surveillance videos,and provides an effective support for security management and event analysis.With the popu-larization of surveillance devices and rapid growth of surveillance video data,traditional manual summarization methods cannot meet the demands of fast processing and accurate extraction of required information.Modern deep learning methods widely have the shorta-ges of high computational complexity and large parameters.To address this issue,a dynamic Transformer-based surveillance video summarization model is proposed.The model automatically assigns appropriate tokens to each input video frame,cascades multiple Transformer models,and gradually increases the number of generated tokens to achieve the adaptive activation order.Once,it gener-ates the sufficient confident predictions,the inference process will terminate.The model adopts the feature reuse and attention reuse techniques to reduce the redundant computations.It makes a significant progress in reducing the computational complexity.Experi-mental tests show that compared with traditional models,the dynamic Transformer model increases the accuracy,the F score indica-tors by 3.7％and 0.9％on two publicly available datasets,respectively.At the same time,the computational complexity is reduced by 40％.This model can meet the requirements of precision and surveillance,demonstrating a good generalization performance.

关键词

视频摘要技术/动态Transformer/计算复杂度/特征重用/注意力重用

Key words

video summarization techniques/dynamic Transformer/computational complexity/feature reuse/attention reuse

引用本文复制引用

出版年

2024

计算机测量与控制

中国计算机自动测量与控制技术协会

计算机测量与控制

CSTPCD

影响因子：0.546

ISSN：1671-4598

段落导航