基于增强全局-局部特征融合的视频描述生成方法

Video description generation method based on enhanced global-local feature fusion

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：现有的视频描述生成方法提取的特征及特征组合的方式较为简单,导致模型丢失了部分与视频描述相关的重要语义信息,限制了对视频内容的准确描述和理解.分析存在的不足,提出了一种基于增强全局-局部特征融合的视频描述生成方法.首先采用不同特征提取器分别对视频片段提取局部特征和全局特征,为了建模不同级别特征(局部和全局)的相关性,利用特征融合增强网络进行特征融合,丰富模型的特征信息.解码器使用的双向长短期记忆网络,并在其后加入重构网络,重构经编码器处理得到的视频特征序列,最终经过长短期记忆网络生成视频的描述语句.在 MSVD与 MSR-VTT数据集上的实验结果表明,提出的模型可以显著提高生成的描述语句的准确性.

外文摘要：Existing video description generation methods extract features and feature combinations in a simpler way,resulting in the model losing some of the important semantic information related to the video description,limiting the accurate description and understanding of the video content.Analysing the deficiencies,this paper proposes a video description generation method based on enhanced global-local feature fusion.Firstly,different feature extractors are used to extract local and global features for the video clips respectively,and in order to model the relevance of different levels of features(local and global),feature fusion is performed using a feature fusion enhancement network to enrich the feature information of the model.In this paper,the bi-directional long and short term memory network used by the decoder is followed by a reconstruction network,which reconstructs the video feature sequences obtained by the encoder processing,and finally generates the descriptive statements of the video through the long and short term memory network.Experimental results on MSVD and MSR-VTT datasets show that the model proposed in this paper can significantly improve the accuracy of the generated descriptive statements.

外文关键词：

video description generationenhanced feature fusion networknatural language processing

作者：

黄飞燕、曾上游、邱泓语

展开 >

作者单位：

广西师范大学电子与信息工程学院/集成电路学院桂林 541004

关键词：

视频描述生成增强特征融合网络自然语言处理

基金：

国家自然科学基金

项目编号：

61976063

出版年：

2024

DOI：

10.19652/j.cnki.femt.2305361

国外电子测量技术

北京方略信息科技有限公司

国外电子测量技术

CSTPCD

影响因子：1.414

ISSN：1002-8978

年,卷(期)：2024.43(1)

参考文献量8