基于知识蒸馏的视频描述轻量化模型及性能优化

扫码查看

原文链接

万方数据
维普

中文摘要：视频描述生成是利用计算机视觉和自然语言处理技术将视频内容转化为文字描述的过程,具有广泛的应用场景,包括信号识别与解码、网络视频会议、视频监控和安防、视频翻译和内容检索等.基于深度学习的视频描述生成模型在性能方面取得了显著突破.这些模型的计算量和复杂度往往较高,难以在计算资源有限的移动通信终端上部署和应用.为了解决这一问题,提出了 2种轻量化模型,分别用于通用视频描述生成和密集视频描述生成任务.以UniVL模型为基准,通过实验确定了满足视频描述任务的最小模型架构.为进一步减小模型的大小,提出了自适应嵌入的压缩策略,根据不同视频数据集类型进行模型压缩.采用了不同层信息的知识蒸馏技术对所提出的轻量化模型进行优化训练,与教师模型进行充分的信息交互,提高轻量化模型的性能.实验结果表明,与基准模型相比,所提出的轻量化模型的参数量能够降低75％,性能指标下降不超过10％.

外文标题：Lightweight Video Captioning Model and Performance Optimization Based on Knowledge Distillation

外文摘要：Video captioning is the process of using computer vision and natural language processing technology to convert video content into textual descriptions.Video captioning has a wide range of applications,including signal recognition and decoding,online video conferences,video surveillance and security,video translation,and content retrieval.Video captioning generation models based on deep learning have made significant advancements in performance.However,these models often have high computational complexity and are difficult to deploy and apply on mobile devices with limited computing resources.To address this problem,two lightweight models for general video captioning and dense video captioning tasks are proposed.These models are based on the UniVL model and experimentally determine the minimum model architecture that satisfies the requirements of video captioning tasks.To further reduce the size of the models,an adaptive embedding compression strategy is also proposed to compress the models based on different types of video datasets.Additionally,knowledge distillation techniques using information at different layers are employed to optimize training for the proposed lightweight models and perform information exchange with teacher models to improve their performance.Experimental results show that compared to the baseline model,the proposed lightweight models achieve a reduction of 75％in model parameters,with a performance decrease of less than 10％.

外文关键词：

video captioningmodel compressionlightweightknowledge distillationpre-trained model

作者：

陈凯、唐振华、崔振雷、李健泽

展开 >

作者单位：

广西大学计算机与电子信息学院,广西南宁 530004

关键词：

视频描述生成模型压缩轻量化知识蒸馏预训练模型

出版年：

2024

DOI：