基于正反上下文语义对齐融合的多模态文本摘要模型

Multi-Modal Text Summarization by Positive and Negative Context Alignment and Fusion

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：该文基于序列到序列的神经网络,提出了使用文本语义信息和图片语义信息对多模态文本摘要生成任务进行建模.具体而言,使用文本一级编码器和带有图片信息指导的二级门控编码器对多模态语义信息进行编码,对齐文本与图片的语义信息.通过多模态正向注意力机制与反向注意力机制多方面观察对齐后的源文本与图片内容,分别得到各自模态语义信息的正相关和不相关特征表示.使用正向滤波器过滤正向注意力机制中的不相关信息,使用反向滤波器过滤反向注意力机制中的相关信息,达到分别从正向与反向两个方面选择性地融合文本语义信息和图片语义信息的目的.最后基于指针生成网络,使用正相关信息搭建正向指针、使用不相关信息搭建反向指针,生成带有多模态语义信息补偿的文本摘要内容.在京东中文电子商务数据集上,所提模型生成的多模态文本摘要在 ROUGE-1、ROUGE-2、ROUGE-L指标上分别取得了 38.40、16.71、28.01 的结果.

外文摘要：Based on sequence-to-sequence neural network,this paper proposes to model multi-modal text summariza-tion generation task using text semantic information and image semantic information.Specifically,a text primary en-coder and a secondary gated encoder with image information guidance are used to encode multi-modal semantic infor-mation and align the semantic information of text and image.By observing the content of source text and image a-ligned by multi-modal forward attention mechanism and reverse attention mechanism,the relevant and irrelevant features of semantic information of each mode are obtained respectively.The forward filter is used to filter the irrele-vant information in the forward attention mechanism,and the reverse filter is used to filter the relevant information in the reverse attention mechanism,so as to selectively merging the semantic information of text and the semantic information of image in the forward and reverse aspects respectively.Finally,based on the pointer generation net-work,the relevant information is used to build the forward pointer,the irrelevant information is used to build the reverse pointer,and the text summarization content with multi-modal semantic information compensation is genera-ted.In JD Chinese e-commerce dataset,the multi-modal text summarization by the proposed model reaches 38.40,16.71 and 28.01 in the indexes of ROUGE-1,ROUGE-2 and ROUGE-L,respectively.

外文关键词：

multi-modal text summarizationmulti-modal alignmentsecondary gated encodingtext-generation model

作者：

陈中峰、陆振宇、荣欢

展开 >

作者单位：

南京信息工程大学人工智能学院,江苏南京 210044

关键词：

多模态文本摘要多模态信息对齐二级门控编码机制文本生成模型

基金：

国家自然科学基金国家自然科学基金江苏省自然科学基金(基础研究计划)

项目编号：

U20B206162102187BK20210639

出版年：

2024

中文信息学报

中国中文信息学会,中国科学院软件研究所

中文信息学报

CSTPCDCHSSCD北大核心

影响因子：0.8

ISSN：1003-0077

年,卷(期)：2024.38(4)

参考文献量27