融合图像信息的越汉跨语言新闻文本摘要方法

Cross-lingual Vietnamese-Chinese news text summarization method with image fusion

扫码查看

原文链接

维普
万方数据

中文摘要：[目的]为了有效剔除冗余文本信息,提高摘要简洁性同时充分利用图像信息提高摘要准确性,对融合图像信息的越汉跨语言新闻文本摘要方法进行研究.[方法]首先利用文本编码器和图像编码器对越南语新闻文本和图像进行表征,其次利用图文对比损失增强图像和文本表征的一致性,迫使越南语的表征空间趋近于与语言无关的图像表征空间,然后利用图文融合器进行图像和文本的有效融合,增强新闻文本的关键信息提取能力,最后利用摘要解码器生成中文摘要.[结果]在本文构建的越汉多模态跨语言摘要数据集上,相较于对比方法,本方法生成的摘要具备更高的ROUGE分数、信息量、简洁度和流畅度.[结论]引入图像信息有利于生成高质量的跨语言摘要;采用单任务直接学习两种语言的互动信息可以降低将跨语言摘要分解为多任务带来的误差累积.

外文摘要：[Objective]The Vietnamese-Chinese cross-language news summarization task aims to convert Vietnamese news into Chinese summaries in a concise,accurate and readable form.The existing Vietnamese-Chinese cross-language news summarization task mainly focuses on the summary and extraction of text information.Although,to a certain extent,it improves the accuracy of generated summaries,it ignores the importance of images in news reports.[Methods]Therefore,this paper proposes a Vietnamese-Chinese cross-language news text summarization method that integrates image information,and explores how to effectively use image information to solve related problems.Due to the lack of image-text cross-language summary datasets,this paper constructs a real dataset of 142 000 news data sample pairs and 235 770 news images on multiple Vietnamese news websites.First,the Vietnamese news text and image are represented using a text encoder and an image encoder.Second,the image-text contrast loss is used to enhance the consistency of image and text representation,forcing the Vietnamese representation space to approach the language-independent image representation space.Third,the image-text fuser is used to effectively fuse images and texts,enhancing the ability to extract key information from news texts.Finally,the summary decoder is used to generate a Chinese summary.[Results]To demonstrate the effectiveness of the Vietnamese-Chinese cross-language summary method that fuses image information,we compare the performance of this method with those of six other baseline methods on the data set constructed in this article.First,experimental results show that this model has significantly improved compared to the traditional cross-language summary model.Second,comparison results with multiple end-to-end cross-language summary models NCLS,indicating that the integration of image information can effectively improve cross-language summary performance.This article also explores the impact of ablation experiments on model performance.The experimental results show that the model performance dropped significantly after removing the image encoding module and the image-text fusion module.After removing the image-text contrast loss module,the model performance dropped and randomly.Selecting an image and replacing it with an image synthesized by Gaussian noise reduced model performance.In addition,this article also adds the hyper-parameter experimental analysis to further explore the important impact of the proportional relationship between the number of text encoding layers and the number of graphic encoding layers on the performance of the overall model.The experimental results show that when 3 layers are text encoders,and 3 layers are image and text encoders,the ROUGE score is highest.Finally,the manual evaluation experimental analysis is added to demonstrate the authenticity of the summary generated by this model.Experimental results show that the information content score,conciseness score and fluency score of MH-CLS perform more satisfactorily than those of models Sum-Trans,Trans-Sum and MCLAS do,thus further suggesting the effectiveness of the method.[Conclusions]The proposed Vietnamese-Chinese cross-language news text summarization method that fuses image information has achieved significant improvements compared with existing cross-language summarization methods.Analysis of the experimental results shows that the addition of image information and image-text comparison modules to guide the generated summary plays an important role in improving the quality of cross-language news summaries;the synergy of images and text is fully utilized in terms of image-text fusion and key information extraction.It can better extract key information and achieve satisfactory results in terms of summary information volume,accuracy and information richness.Such advantages clearly demonstrate the vital role of images in cross-language summarization and show that our approach can effectively use image information to improve both the quality and understandability of summaries.

外文关键词：

cross-lingual summarizationVietnamese-Chinese cross-lingual news summarizationtext-image fusiontext-image contrastive loss

作者：

吴奇远、余正涛、黄于欣、谭凯文、张勇丙

展开 >

作者单位：

昆明理工大学信息工程与自动化学院,云南昆明 650500

云南省人工智能重点实验室,云南昆明 650500

昆明理工大学云南省南亚东南亚语言机器翻译及应用国际联合实验室,云南昆明 650500

昆明理工大学南亚东南亚语言语音信息处理教育部工程研究中心,云南昆明 650500

展开 >

关键词：

跨语言摘要越汉跨语言新闻摘要图文融合图文对比损失

基金：

国家自然科学基金国家自然科学基金国家自然科学基金国家自然科学基金云南省重大科技专项计划项目云南省重大科技专项计划项目云南省基础研究计划项目云南省基础研究计划项目昆明理工大学"双一流"共建项目

项目编号：

U21B2027619721866226602762266028202302AD08003202202AD080003202301AT070393202301AT070471202201BE070001-021

出版年：

2024

DOI：

10.6043/j.issn.0438-0479.202309001

厦门大学学报(自然科学版)

厦门大学

厦门大学学报(自然科学版)

CSTPCD北大核心

影响因子：0.449

ISSN：0438-0479

年,卷(期)：2024.63(4)