基于预训练模型自适应匹配的视觉故事生成算法

Visual Story Generation Based on Adaptive Pre-trained Model Matching

宁铭 ¹江爱文 ¹崔朝阳 ¹刘长红 ¹王明文²

扫码查看

作者信息

1. 江西师范大学计算机信息工程学院,江西南昌 330022
2. 江西师范大学计算机信息工程学院,江西南昌 330022;江西师范大学数字产业学院,江西上饶 334000
折叠

摘要

视觉故事生成任务是为一组图像序列生成具有表现力和连贯性的、能准确描述所涉及视觉内容的语句段落,是当前计算机视觉和自然语言处理交叉领域中一个有趣而又快速发展的多模态研究方向.随着预训练模型在各种下游任务的成功,基于预训练模型的视觉故事生成算法也被广泛研究.但因为数据模态的差异和语义鸿沟的存在,预训练模型在微调学习过程中会产生灾难性遗忘问题.如何协调视觉和语言两种模态数据的预训练模型,是当前多模态预训练模型研究的主要目标之一.该文提出基于预训练模型自适应匹配的视觉故事生成算法,一方面综合挖掘图像流的视觉、关系、序列等多样化互补信息,弥补语义差异;同时,另一方面用适应性损失对图文两种模态数据进行特征对齐,以及对图像流数据进行连续信息对齐,取得了较好的效果.算法在目前已公开的视觉故事生成数据集(VIST)上与近年的先进算法进行实验比较.评测结果表明,该文算法在生成故事的图文相关性、文本多样性、内容逻辑连贯性等指标上取得了具有竞争力的结果.

Abstract

The visual story generation task is to generate expressive and coherent sentences describing the visual con tent for a set of image sequences.To coordinate the pre training models of visual and linguistic modal data,this arti-cle proposes a visual story generation algorithm based on adaptive pre-trained model alignment.On the one hand,it comprehensively mines the diverse complementary information of visual,relational,sequential and other aspects of image streams.On the other hand,it applies the adaptive matching loss to align the features between the two modal data,as well as continuous information among the image stream data.Compared with recent state-of-the-art algo-rithms on VIST dataset,the proposed method has achieved competitive results in terms of image and text correla-tion,text diversity,and content logical coherence of the story.

关键词

视觉故事/适应匹配损失/预训练模型/多模态特征/图像序列

Key words

visual storytelling/adaptive matching loss/pretrained model/multimodal feature/image sequence

引用本文复制引用

基金项目

国家自然科学基金(61966018)

国家自然科学基金(62067004)

国家自然科学基金(62266023)

出版年

2024

中文信息学报

中国中文信息学会,中国科学院软件研究所

中文信息学报

CSTPCDCSCDCHSSCD北大核心

影响因子：0.8

ISSN：1003-0077

参考文献量1

段落导航