首页|基于关键特征的长文本摘要生成方法

基于关键特征的长文本摘要生成方法

扫码查看
在基于深度学习的摘要生成研究中,大部分研究都是针对短文本展开。长文本的摘要生成面临生成信息不准确、目标摘要冗余、摘要句缺失等一系列问题。论文以长文本为研究对象,从文本的关键词和关键句等关键信息出发,提出一种基于关键特征的长文本摘要生成方法。论文首先使用基于LDA改进的关键词提取算法得到源文本的关键词,将关键词编码成关键信息向量,用于解码阶段的相关计算;其次使用基于TextRank改进的关键句提取算法对源文本提取关键句,实现对源文本的压缩;最后使用Bert语言模型和Transformer模型,并结合Copy机制针对压缩后文本生成文本摘要,提升摘要句抽取的准确率。实验证明,论文提出的方法在中文和英文数据集上得到的ROUGE分数优于主流的摘要生成方法。
Summarization Generation Method for Long Text Based on Key Features
In the research of abstract generation based on deep learning,most of the research is focused on short text.Abstract generation of long text faces a series of problems,such as inaccurate generation information,redundant target summary,lack of summary sentences and so on.Taking long text as the research object,starting from the key information such as key words and key sentences,this paper proposes a long text summary generation method based on key features.Firstly,this paper uses the improved keyword extraction algorithm based on LDA to obtain the keywords of the source text,and encodes the keywords into key informa-tion vectors for correlation calculation in the decoding stage.Secondly,the key sentence extraction algorithm based on TextRank is used to extract the key sentence from the source text to realize the compression of the source text.Finally,Bert language model and Transformer model are used,combined with Copy mechanism to generate text summary for compressed text,so as to improve the ac-curacy of summary sentence extraction.Experiments show that the ROUGE scores obtained by the proposed method on Chinese and English data sets are better than the mainstream summary generation methods.

long text summaryBertkey featuresTransformer

张进、赵逢禹

展开 >

上海理工大学光电信息与计算机工程学院 上海 200093

上海出版印刷高等专科学校信息与智能工程系 上海 200093

长文本摘要 Bert 关键特征 Transformer

国家自然科学基金项目上海市自然科学基金项目

6160230519ZR1477600

2024

计算机与数字工程
中国船舶重工集团公司第七0九研究所

计算机与数字工程

CSTPCD
影响因子:0.355
ISSN:1672-9722
年,卷(期):2024.52(5)
  • 16