基于孪生网络文本语义匹配的多文档摘要

Siamese Network-Based Text Semantic Matching for Multi-Document Summarization

扫码查看

原文链接

维普
万方数据

中文摘要：多文档摘要旨在从一组主题相关的文档集中抽取出最能代表文档集中心内容的句子作为摘要,文本语义匹配则是指学习两个文本单元之间的语义关系,使句子表征具有更加丰富的语义信息.该文提出了一种基于孪生网络文本语义匹配的多文档抽取式摘要方法,该方法将孪生网络和预训练语言模型BERT相结合,构建一个文本语义匹配与文本摘要联合学习模型.该模型运用孪生网络从不同的视角考察任意两个文本单元之间的语义关联,学习文档集中碎片化的信息,进一步对重要信息进行评估,最后结合文本摘要模型选择出更能代表文档集主要内容的句子组织成摘要.实验结果表明,该文所提方法和当前主流的多文档抽取式摘要方法相比,在ROUGE评价指标上有较大提升.

外文摘要：Multi-document summarization aims to extract the sentences as a summary to best represents the central content of the document set.Text semantic matching refers to learning the semantic relationship between two text u-nits,so that the sentence representation has richer semantic information.This paper proposes a siamese network based text semantic matching for multi-document extraction summarization.This method combines siamese network and pre-training model BERT to construct a joint learning model of text semantic matching and text summarization.The model uses the twin network to examine the semantic association between any two text units from different per-spectives,learns the fragmented information in the document set,and finally combines the text summary model to select the main content of the document set.The experimental results show that compared with the current main-stream multi-document extractive summarization method,this method has a substantial improvement in the ROUGE index.

外文关键词：

multi-document extractive summarizationsemantic relationpre-training language model

作者：

钟琪、王中卿、王红玲

展开 >

作者单位：

苏州大学计算机科学与技术学院,江苏苏州 215006

关键词：

多文档抽取式摘要语义关系预训练语言模型

基金：

国家自然科学基金

项目编号：

61976146

出版年：

2024

中文信息学报

中国中文信息学会,中国科学院软件研究所

中文信息学报

CSTPCDCHSSCD北大核心

影响因子：0.8

ISSN：1003-0077

年,卷(期)：2024.38(5)

参考文献量1