面向长文本涉法舆情信息的混合式摘要方法

Hybrid Summarization Method for Long Judicial Public Opinion Texts

扫码查看

原文链接

维普
万方数据

中文摘要：涉法舆情摘要旨在从冗长复杂的舆情文本中,准确地生成简短摘要.在长文本涉法舆情摘要中,现有的自动文本摘要方法存在语义不连贯、关键信息丢失的问题.为此,该文提出了一种结合抽取式和生成式的混合式摘要方法.首先将长文本分成多个语义片段;其次采用无监督对比学习方法微调RoBERTa-wwm-ext模型进行语义片段的表征;然后使用膨胀门卷积神经网络抽取与摘要相关的语义片段,合成抽取文本;最后通过微调预训练语言模型PEGASUS对抽取文本进行摘要生成,以获得最佳生成摘要.在CAIL 2022涉法舆情摘要数据集上的实验结果表明,相比于其他的基线模型,该方法能够生成ROUGE和BLEU得分更高的摘要,进一步提升了摘要的可靠性.

外文摘要：Judicial public opinion summary aims to generate concise summaries from lengthy and complex public opinion texts.Existing automatic text summarization methods face challenges in generating coherent semantics and preserving key information.To address this issue,this paper proposes a hybrid summarization approach combining extractive and abstractive methods.Firstly,the long text is segmented into several semantic fragments.Then,an unsupervised contrastive learning method is employed to fine-tune the RoBERTa-wwm-ext model for semantic rep-resentation of these fragments.Subsequently,a dilate gated convolutional neural network is utilized to extract se-mantically relevant fragments and synthesize the extractive text.Finally,the fine-tuning is performed on the pre-trained language model PEGASUS to generate the optimal summary from the extracted text.Experimental results on the CAIL 2022 Judicial Opinion Summary Dataset demonstrate that,compared to other baseline models,this method is capable of generating summaries with higher ROUGE and BLEU scores.

外文关键词：

judicial public opinion summarizationhybrid summarizationpre-trained language model

作者：

席铁钧、段宗涛、曹建荣、杨博、卜娜娜、刘悦霞、肖媛媛

展开 >

作者单位：

长安大学信息工程学院,陕西西安 710018

关键词：

涉法舆情摘要混合式摘要预训练语言模型

基金：

陕西省重点研发计划项目陕西省特支计划科技创新领军人才项目

项目编号：

2019ZDLGY17-08TZ0336

出版年：

2024

中文信息学报

中国中文信息学会,中国科学院软件研究所

中文信息学报

CSTPCDCHSSCD北大核心

影响因子：0.8

ISSN：1003-0077

年,卷(期)：2024.38(7)