PLSGA:阶段式长文本摘要生成方法

扫码查看

原文链接

万方数据
维普

中文摘要：针对现有方法在处理长文本时,存在冗余信息处理困难和无法筛选出最高质量摘要的问题,提出了一种阶段式长文本摘要生成方法(PLSGA).将样本数据的文本和参考摘要分割,利用Sentence-BERT获取语义向量并进行相似度比对,从中抽取文本的关键信息;通过关键信息和非关键信息训练抽取模型,以尽可能地保留原文本的语义信息;将抽取的关键信息和参考摘要作为样本输入骨干模型BART进行生成模型训练;通过生成模型生成多条候选摘要,并使用无参考摘要评分模型筛选出质量最好的摘要.提出的阶段式长文本摘要生成方法在多个中文长文本数据集上进行实验,结果表明相对于目前主流的方法以及ChatGPT,其效果均有提升,具有领域优势,生成的摘要质量更好,更具可读性.

外文标题：PLSGA:Phase-Wise Long Text Summary Generation Approach

外文摘要：Aiming at the problem that the existing methods have difficulty in processing redundant information and can-not select the highest quality abstract when dealing with long text,this paper proposes a staged long text abstract genera-tion method(PLSGA).Firstly,the paper segments the text of the sample data and the reference summary,and uses Sentence-BERT to compare and extract the key information of the text.The paper trains the extraction model through key in-formation and non-key information to retain the semantic information of the original text as much as possible.The extract-ed key information and reference summaries are input as samples into the backbone model BART for generative model train-ing.Finally,multiple candidate summaries are generated through the generative model,and the best-quality summaries are selected using the no-reference summaries scoring model.The experiment proves that the proposed stage-based long text summary generation method has been tested on multiple Chinese long text data sets.The results show that compared with the current mainstream method and ChatGPT,its effect has been improved,having domain advantages,and the quali-ty of the generated summary is much better and more readable.

外文关键词：

text summarizationSentence-BERTkey informationBARTno-reference summarization scoring model

作者：

方缙、李宝安、游新冬、吕学强

展开 >

作者单位：

北京信息科技大学计算机学院,北京 100101

北京信息科技大学网络文化与数字传播北京市重点实验室,北京 100101

关键词：

文本摘要 Sentence-BERT 关键信息 BART 无参考摘要评分模型

出版年：

2025

DOI：

10.3778/j.issn.1002-8331.2309-0048

计算机工程与应用

华北计算技术研究所

计算机工程与应用

北大核心

影响因子：0.683

ISSN：1002-8331

年,卷(期)：2025.61(2)