基于司法裁判文本的法律要素抽取方法

Method for Extracting Legal Elements Based on Judgment Text

扫码查看

原文链接

维普
万方数据

中文摘要：随着全国法院信息化建设的不断发展,积累了大规模的司法裁判文书数据,如何从法律文书数据中抽取出准确的法律要素是保障法院信息化建设的重要前提.本研究主要基于盗窃罪案件法律文书数据,采用深度学习方法构建了BERT+BiLSTM+CRF融合语言模型以解决法律文书的关键要素抽取问题.利用BERT语言模型解决文本特征表示中的一词多义问题,使用BiLSTM神经网络充分学习上下文信息的特点,采用CRF机器学习方法提取全局最优标注序列,并搭建可视化界面提供案件要素提取服务.结果表明,从整体来看,通过数据增强后所构建的BERT+BiLSTM+CRF融合语言模型面向盗窃罪案件的综合评价指标Fl_score值达到了90.6％;从单个要素的抽取结果来看,该模型面向盗窃罪案件的十个法律要素的综合评价指标的Fl_score值均在81.8％以上;从最佳预测性能所分布的法律要素占比来看,该模型达到最优预测性能的法律要素量达到了50％,明显优于其他模型.这说明本研究所构建的BERT+BiLSTM+CRF融合语言模型可以有效解决法律文书的关键要素抽取问题,可为全国法院信息化建设提供一定的理论依据和有效技术支撑.

外文摘要：With the continuous development of national court informatization construction,a large amount of judicial judgment data has been accumulated.How to extract accurate legal elements from legal docu-ment data is an important prerequisite for ensuring court informatization construction.Based on the legal document data of theft cases,this study constructed a BERT+BiLSTM+CRF fusion language model to solve the problem of extracting key elements from legal documents.BERT language model was used to solve the problem of polysemy in text feature representation,BiLSTM neural network was utilized to fully learn the characteristics of contextual information,CRF machine learning method was used to extract the global optimal annotation sequence,and a visual interface was established to provide case element extrac-tion services.The results showed that,overall,the BERT+BiLSTM+CRF fusion language model con-structed through data augmentation achieves a comprehensive evaluation index value of 90.6％for theft cases;From the extraction results of individual elements,the comprehensive evaluation indicator(F1_score)for the ten legal elements of the theft case were all above 81.8％;From the proportion of legal ele-ments distributed for optimal predictive performance,the model achieved optimal predictive performance with 50％of legal elements,which is significantly better than other models.This research indicates that the BERT+BiLSTM+CRF fusion language model can effectively solve the problem of extracting key ele-ments of judgment text,and provide a certain theoretical basis and effective technical support for the in-formatization construction of national courts.

外文关键词：

legal elementsextractjudgment textBERTBiLSTMCRF

作者：

董玉红、卢鹏、陈静、郭新刚、陈震

展开 >

作者单位：

中国司法大数据研究院有限公司,北京 100041

关键词：

法律要素抽取裁判文本 BERT BiLSTM CRF

基金：

国家重点研发计划

项目编号：

2021YFC3340103

出版年：

2024

DOI：

10.3969/j.issn.1673-5692.2024.06.011

中国电子科学研究院学报

中国电子科学研究院

中国电子科学研究院学报

影响因子：0.663

ISSN：1673-5692

年,卷(期)：2024.19(6)