结合预训练的多文档摘要研究

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：新闻文本摘要任务旨在从庞大复杂的新闻文本中快速准确地提炼出简明扼要的摘要.基于预训练语言模型对多文档摘要进行研究,重点研究结合预训练任务的具体模型训练方式对模型效果提升的作用,强化多文档之间的信息交流,以生成更全面、更简练的摘要.对于结合预训练任务,提出对基线模型、预训练任务内容、预训练任务数量、预训练任务顺序的对比实验,探索标记了行之有效的预训练任务,总结归纳了强化多文档之间的信息交流的具体方法,精炼提出了简明高效的预训练流程.在公开新闻多文档数据集上进行训练和测试,实验结果表明预训练任务的内容、数量、顺序对ROUGE值都有一定提升,并且整合三者结论提出的特定预训练组合对ROUGE值有明显提升.

外文标题：Study on Pre-training Tasks for Multi-document Summarization

外文摘要：News summarization aims to quickly and accurately extract a concise summary from the complex news text.This paper studies the multi-document summary based on the pre-training language model,focusing on the effect of model training methods combined with pre-training tasks on improving model performance,and strengthening information exchange between multiple documents to generate more comprehensive and brief summaries.For combined pre-training tasks,this paper conducts compara-tive experiments on the baseline model,pre-training task content,pre-training task quantity,and pre-training task order,explores and marks effective pre-training tasks,summarizes the specific methods to strengthen the information exchange between docu-ments,and refines and proposes a concise and efficient pre-training process.Through training and testing on the public news multi-document dataset,experimental results show that the content,quantity,and order of the pre-training tasks have a certain improvement on the ROUGE value,and the specific pre-training combination proposed by integrating the conclusions of the three has a significant increase in the ROUGE value.

外文关键词：

NewsSummarizationPre-trainingMulti-documentInformation exchange

作者：

丁一、王中卿

展开 >

作者单位：

苏州大学计算机科学与技术学院江苏苏州 215006

关键词：

新闻摘要预训练多文档信息交流

出版年：

2024

DOI：

10.11896/jsjkx.230300160

计算机科学

重庆西南信息有限公司（原科技部西南信息中心）

计算机科学

CSTPCD北大核心

影响因子：0.944

ISSN：1002-137X

年,卷(期)：2024.51(z1)

参考文献量26