基于多元组匹配损失的司法论辩理解方法

Judicial Argumentation Understanding Method Based on Multiplet Loss

张可 ¹艾中良 ²刘忠麟 ³顾平莉 ¹刘学林⁴

扫码查看

作者信息

1. 华北计算技术研究所大数据研发中心,北京 100083
2. 中电科发展规划研究院有限公司,北京 100041
3. 中国司法大数据研究院有限公司,北京 100043
4. 中国卫星网络集团有限公司,北京 100029
折叠

摘要

司法论辩理解是论辩挖掘任务在司法领域的具体应用,旨在从诉辩双方观点中挖掘存在交互的观点对.司法领域论辩挖掘任务存在数据样本少、句子长度长、领域专业性强等问题,现有的司法论辩理解模型多基于文本分类思想,构建的模型文本语义表示能力差.为进一步提高论辩交互观点对的识别准确率,提出一种基于多元组匹配损失函数(Multi-plet Loss)的司法论辩理解模型,该模型基于文本匹配的思想,将诉称观点与辩称观点分别进行语义相似性匹配,通过优化交互观点对的匹配度实现论辩交互观点对的挖掘.为提升模型对于论辩交互观点对的匹配度,提出多元组匹配损失函数,通过减小论辩交互观点对的语义距离,加大非交互观点的语义距离,使观点间的语义距离能更好地反应其交互性,采用司法领域预训练模型作为文本语义识别模型,进一步提高了文本的语义表达能力.采用CAIL2022论辩理解赛道数据进行测试,实验结果表明基于多元组匹配损失函数的司法论辩理解模型相较于采用分类思想的模型,准确率能够提高2.04个百分点,达到85.19%,提高了司法论辩理解任务精度.

Abstract

Judicial Argument Understanding is a practical application of Argument Mining in judicial domain,aiming at mining the interactive argument pair from the arguments of the prosecution and the defense.Argument mining task in judicial domain has the problems of small training samples,long sentence length,and strong domain specialization,etc.Existing models for Judicial Argument Understanding are mostly based on the idea of text classification,and have poor capability of representing the text se-mantics.To improve the recognition accuracy of the interactive argument pairs,a Judicial Argument Understanding model based on multiplet loss is proposed,which is based on the idea of text matching,matching the prosecutor argument with the defense ar-gument separately for semantic similarity,and realizing the mining of the interactive argument pairs by optimizing the matching degree of the interactive argument pairs.To improve the matching degree of the model for interactive argument pairs,a multivari-ate group matching loss function is proposed,which further improves the text semantic representation ability by reducing the se-mantic distance of argument interactive pairs and increasing the semantic distance of non-interactive pairs,so that the semantic distance between arguments can better reflect their interactivity,and the pre-trained model in judicial domain is used as the text semantic representation model.CAIL2022 Judicial Argument Understanding track data was used for testing,and the experimen-tal results showed that the accuracy of the Judicial Argument Understanding model based on multiplet loss function was able to improve by more than 2.04Percentage Points to 85.19%compared with the model using classification ideas,which improved the accuracy of the Judicial Argument Understanding task.

关键词

多元组匹配损失/司法领域预训练模型/司法论辩理解/论辩挖掘/文本分类/自然语言处理/深度学习

Key words

multiplet loss/pre-trained models in judicial domain/judicial argument understanding/argument mining/text classification/natural language processing/deep learning

引用本文复制引用

基金项目

国家重点研发计划(2022YFC3340900)

出版年

2024

计算机与现代化

江西省计算机学会江西省计算技术研究所

计算机与现代化

CSTPCD

影响因子：0.472

ISSN：1006-2475

段落导航