计算机工程与应用2025,Vol.61Issue(2) :170-178.DOI:10.3778/j.issn.1002-8331.2309-0048

PLSGA:阶段式长文本摘要生成方法

PLSGA:Phase-Wise Long Text Summary Generation Approach

方缙 李宝安 游新冬 吕学强
计算机工程与应用2025,Vol.61Issue(2) :170-178.DOI:10.3778/j.issn.1002-8331.2309-0048

PLSGA:阶段式长文本摘要生成方法

PLSGA:Phase-Wise Long Text Summary Generation Approach

方缙 1李宝安 1游新冬 1吕学强1
扫码查看

作者信息

  • 1. 北京信息科技大学 计算机学院,北京 100101;北京信息科技大学 网络文化与数字传播北京市重点实验室,北京 100101
  • 折叠

摘要

针对现有方法在处理长文本时,存在冗余信息处理困难和无法筛选出最高质量摘要的问题,提出了一种阶段式长文本摘要生成方法(PLSGA).将样本数据的文本和参考摘要分割,利用Sentence-BERT获取语义向量并进行相似度比对,从中抽取文本的关键信息;通过关键信息和非关键信息训练抽取模型,以尽可能地保留原文本的语义信息;将抽取的关键信息和参考摘要作为样本输入骨干模型BART进行生成模型训练;通过生成模型生成多条候选摘要,并使用无参考摘要评分模型筛选出质量最好的摘要.提出的阶段式长文本摘要生成方法在多个中文长文本数据集上进行实验,结果表明相对于目前主流的方法以及ChatGPT,其效果均有提升,具有领域优势,生成的摘要质量更好,更具可读性.

Abstract

Aiming at the problem that the existing methods have difficulty in processing redundant information and can-not select the highest quality abstract when dealing with long text,this paper proposes a staged long text abstract genera-tion method(PLSGA).Firstly,the paper segments the text of the sample data and the reference summary,and uses Sentence-BERT to compare and extract the key information of the text.The paper trains the extraction model through key in-formation and non-key information to retain the semantic information of the original text as much as possible.The extract-ed key information and reference summaries are input as samples into the backbone model BART for generative model train-ing.Finally,multiple candidate summaries are generated through the generative model,and the best-quality summaries are selected using the no-reference summaries scoring model.The experiment proves that the proposed stage-based long text summary generation method has been tested on multiple Chinese long text data sets.The results show that compared with the current mainstream method and ChatGPT,its effect has been improved,having domain advantages,and the quali-ty of the generated summary is much better and more readable.

关键词

文本摘要/Sentence-BERT/关键信息/BART/无参考摘要评分模型

Key words

text summarization/Sentence-BERT/key information/BART/no-reference summarization scoring model

引用本文复制引用

出版年

2025
计算机工程与应用
华北计算技术研究所

计算机工程与应用

CSCD北大核心
影响因子:0.683
ISSN:1002-8331
段落导航相关论文