首页|面向长文本涉法舆情信息的混合式摘要方法

面向长文本涉法舆情信息的混合式摘要方法

扫码查看
涉法舆情摘要旨在从冗长复杂的舆情文本中,准确地生成简短摘要.在长文本涉法舆情摘要中,现有的自动文本摘要方法存在语义不连贯、关键信息丢失的问题.为此,该文提出了一种结合抽取式和生成式的混合式摘要方法.首先将长文本分成多个语义片段;其次采用无监督对比学习方法微调RoBERTa-wwm-ext模型进行语义片段的表征;然后使用膨胀门卷积神经网络抽取与摘要相关的语义片段,合成抽取文本;最后通过微调预训练语言模型PEGASUS对抽取文本进行摘要生成,以获得最佳生成摘要.在CAIL 2022涉法舆情摘要数据集上的实验结果表明,相比于其他的基线模型,该方法能够生成ROUGE和BLEU得分更高的摘要,进一步提升了摘要的可靠性.
Hybrid Summarization Method for Long Judicial Public Opinion Texts
Judicial public opinion summary aims to generate concise summaries from lengthy and complex public opinion texts.Existing automatic text summarization methods face challenges in generating coherent semantics and preserving key information.To address this issue,this paper proposes a hybrid summarization approach combining extractive and abstractive methods.Firstly,the long text is segmented into several semantic fragments.Then,an unsupervised contrastive learning method is employed to fine-tune the RoBERTa-wwm-ext model for semantic rep-resentation of these fragments.Subsequently,a dilate gated convolutional neural network is utilized to extract se-mantically relevant fragments and synthesize the extractive text.Finally,the fine-tuning is performed on the pre-trained language model PEGASUS to generate the optimal summary from the extracted text.Experimental results on the CAIL 2022 Judicial Opinion Summary Dataset demonstrate that,compared to other baseline models,this method is capable of generating summaries with higher ROUGE and BLEU scores.

judicial public opinion summarizationhybrid summarizationpre-trained language model

席铁钧、段宗涛、曹建荣、杨博、卜娜娜、刘悦霞、肖媛媛

展开 >

长安大学信息工程学院,陕西西安 710018

涉法舆情摘要 混合式摘要 预训练语言模型

陕西省重点研发计划项目陕西省特支计划科技创新领军人才项目

2019ZDLGY17-08TZ0336

2024

中文信息学报
中国中文信息学会,中国科学院软件研究所

中文信息学报

CSTPCDCHSSCD北大核心
影响因子:0.8
ISSN:1003-0077
年,卷(期):2024.38(7)