Judicial public opinion summary aims to generate concise summaries from lengthy and complex public opinion texts.Existing automatic text summarization methods face challenges in generating coherent semantics and preserving key information.To address this issue,this paper proposes a hybrid summarization approach combining extractive and abstractive methods.Firstly,the long text is segmented into several semantic fragments.Then,an unsupervised contrastive learning method is employed to fine-tune the RoBERTa-wwm-ext model for semantic rep-resentation of these fragments.Subsequently,a dilate gated convolutional neural network is utilized to extract se-mantically relevant fragments and synthesize the extractive text.Finally,the fine-tuning is performed on the pre-trained language model PEGASUS to generate the optimal summary from the extracted text.Experimental results on the CAIL 2022 Judicial Opinion Summary Dataset demonstrate that,compared to other baseline models,this method is capable of generating summaries with higher ROUGE and BLEU scores.
关键词
涉法舆情摘要/混合式摘要/预训练语言模型
Key words
judicial public opinion summarization/hybrid summarization/pre-trained language model