基于BERT模型的生成式自动文本摘要

Abstractive Summarization Based on BERT Model

扫码查看

原文链接

国家科技期刊平台
NETL
NSTL
万方数据

中文摘要：随着深度学习的不断发展,预训练语言模型在自然语言处理领域已经取得了良好的效果.当然,自动文本摘要作为自然语言处理领域的重要研究方向之一也得益于大规模预训练语言模型.尤其在生成式文本摘要方面,利用大规模预训练语言模型,生成一段能较为准确地反映原文主旨信息的摘要.但是目前的研究还存在一些问题,比如对原文档的语义信息了解不够充分,无法对多义词进行有效表征,生成的摘要存在重复内容,且逻辑性不强等.为了缓解上述问题,论文基于BERT预训练语言模型提出一种新的生成式文本摘要模型TextRank-BERT-PGN-Coverage(TBPC).该模型利用经典的Encoder-Decoder框架,预训练权重并生成摘要.该实验采用CNN/Daily Mail数据集作为实验所用数据集,实验结果表明,与该领域目前已有的研究结果相比,论文提出的模型取得了较好的实验效果.

外文摘要：With the continuous development of deep learning,pre-trained language models have achieved great results in the field of natural language processing.Of course,automatic text summarization,as an important research direction in natural lan-guage processing,also benefits from large-scale pre-trained language models.In particular,a large-scale pre-training language model is used to generate an abstractive summarization that can accurately reflect the main idea of the original text.However,there are still some problems in current research,such as insufficient understanding of the semantic information of the original document,unable to effectively represent polysemy,the generated abstract has repeated content,and the logicality is not strong.In order to al-leviate the above problems,this paper proposes a new generative text summarization model TextRank-BERT-PGN-Coverage(TB-PC)based on BERT pre-trained language model.The model uses classical Encoder-Decoder framework to pre-train weights and generate abstracts.In this experiment,CNN/Daily Mail dataset is used as the experimental dataset.Experimental results show that compared with the existing research results in this field,the model proposed in this paper achieves a better experimental result.

外文关键词：

abstractive summarizationTextRank algorithmBERT modelpointer generator networkcoverage mechanism

作者：

周圆、张琨、陈智源、江浩俊、方自正

展开 >

作者单位：

南京理工大学南京 210094

关键词：

生成式文本摘要 TextRank算法 BERT模型指针生成网络覆盖机制

出版年：

2024

DOI：

10.3969/j.issn.1672-9722.2024.10.035

计算机与数字工程

中国船舶重工集团公司第七0九研究所

计算机与数字工程

CSTPCD

影响因子：0.355

ISSN：1672-9722

年,卷(期)：2024.52(10)