首页|基于知识蒸馏的视频描述轻量化模型及性能优化

基于知识蒸馏的视频描述轻量化模型及性能优化

扫码查看
视频描述生成是利用计算机视觉和自然语言处理技术将视频内容转化为文字描述的过程,具有广泛的应用场景,包括信号识别与解码、网络视频会议、视频监控和安防、视频翻译和内容检索等.基于深度学习的视频描述生成模型在性能方面取得了显著突破.这些模型的计算量和复杂度往往较高,难以在计算资源有限的移动通信终端上部署和应用.为了解决这一问题,提出了 2种轻量化模型,分别用于通用视频描述生成和密集视频描述生成任务.以UniVL模型为基准,通过实验确定了满足视频描述任务的最小模型架构.为进一步减小模型的大小,提出了自适应嵌入的压缩策略,根据不同视频数据集类型进行模型压缩.采用了不同层信息的知识蒸馏技术对所提出的轻量化模型进行优化训练,与教师模型进行充分的信息交互,提高轻量化模型的性能.实验结果表明,与基准模型相比,所提出的轻量化模型的参数量能够降低75%,性能指标下降不超过10%.
Lightweight Video Captioning Model and Performance Optimization Based on Knowledge Distillation
Video captioning is the process of using computer vision and natural language processing technology to convert video content into textual descriptions.Video captioning has a wide range of applications,including signal recognition and decoding,online video conferences,video surveillance and security,video translation,and content retrieval.Video captioning generation models based on deep learning have made significant advancements in performance.However,these models often have high computational complexity and are difficult to deploy and apply on mobile devices with limited computing resources.To address this problem,two lightweight models for general video captioning and dense video captioning tasks are proposed.These models are based on the UniVL model and experimentally determine the minimum model architecture that satisfies the requirements of video captioning tasks.To further reduce the size of the models,an adaptive embedding compression strategy is also proposed to compress the models based on different types of video datasets.Additionally,knowledge distillation techniques using information at different layers are employed to optimize training for the proposed lightweight models and perform information exchange with teacher models to improve their performance.Experimental results show that compared to the baseline model,the proposed lightweight models achieve a reduction of 75%in model parameters,with a performance decrease of less than 10%.

video captioningmodel compressionlightweightknowledge distillationpre-trained model

陈凯、唐振华、崔振雷、李健泽

展开 >

广西大学计算机与电子信息学院,广西南宁 530004

视频描述生成 模型压缩 轻量化 知识蒸馏 预训练模型

2024

无线电工程
中国电子科技集团公司第五十四研究所

无线电工程

影响因子:0.667
ISSN:1003-3106
年,卷(期):2024.54(11)