面向司法文书的抽取-生成式自动摘要模型
Automatic extractive-abstractive summarization model for judicial documents
陈炫言 1安娜 1孙宇 1周炼赤1
作者信息
- 1. 中国航天科工集团第二研究院 七○六所,北京 100854
- 折叠
摘要
为解决抽取式摘要核心信息拼接生硬,生成式摘要源文本过长易忽略重要信息等问题,对抽取式摘要和生成式摘要的结合进行研究.通过分析抽取式摘要可提取出文本关键信息且缩短源文本长度特性;生成式摘要可降低序列间信息损失,增加文本关联的优势.提出一种面向司法文书的抽取-生成式自动摘要模型,融合模型优势,避免单一模型存在的关键文本信息重复及重组段落语法不准的问题,保障法律文书抽取的切实完整性.在大规模公开法律领域裁判文书数据集上的实验结果表明,该模型获得较高ROUGE得分,表明了该模型提升了摘要质量.
Abstract
To solve the problems of stiff splicing of core information in extracted abstracts and the tendency to overlook important information when the source text of generative abstracts is too long,a study was conducted on the combination of extracted and generative abstracts.By analyzing the extracted abstract,key information of the text was extracted and the length of the source text was shortened.The information loss between sequences was reduced through generative summarization and the advantage of text association was increased.An extraction generation automatic summary model for judicial documents was proposed,which integrated the advantages of the model,the duplication of key text information and the inaccuracy of the syntax of restructured paragraphs in a single model were avoided,and the practical integrity of legal document extraction was ensured.Experimental results on a large-scale public dataset of judicial documents in the legal field indicate that the model achieves a high ROUGE score,indicating that the model improves the quality of abstracts.
关键词
自动摘要/抽取式/生成式/算法融合/裁判文书/法律领域/完整连贯性Key words
automatic summarization/extractive/generative/algorithm fusion/judgment documents/legal field/complete coherence引用本文复制引用
出版年
2024