基于Transformer的司法文书命名实体识别方法

Named Entity Recognition Approach of Judicial Documents Based on Transformer

王颖洁 ¹张程烨 ¹白凤波 ²汪祖民¹

扫码查看

作者信息

1. 大连大学信息工程学院大连 116622
2. 广西民族大学人工智能学院南宁 530006
折叠

摘要

命名实体识别是自然语言处理领域的关键任务之一,是实现下游任务的基础.目前针对司法领域的相关研究相对较少,司法系统的信息化和智能化转型仍有许多问题亟需解决.相比其他领域的文本,司法文书存在专业性强、语料资源少等局限,导致现有的司法文书识别结果较低.因此,从以下3方面开展研究:首先,提出了一种多标签层级迭代的文本标注方式,可以对原始司法文书文本进行自动化标注,同时有效地提升司法文书命名实体识别任务的实体识别效果;其次,提出了一种交融式的Transformer神经网络模型,对汉字固有属性的深层特征进行了充分利用,用于对司法文书进行命名实体识别;最后,对所提出的标注方法和模型与其他神经网络模型进行了对比实验.所提出的文本标注方式可以较为准确地实现司法文书的标注任务;同时,所提出的模型在通用数据集中相对于对照模型有较大的提高,并在司法领域数据集中取得了良好的效果.

Abstract

Named entity recognition is one of the key tasks in the field of natural language processing,and it is the foundation of downstream tasks.At present,there are relatively few research results on the judicial field,and there are still many problems need to be solved in the informatization and intelligent transformation of the judicial system.Compared with texts in other fields,judi-cial documents have limitations such as strong professionalism and few corpus resources,leading to low recognition results of ex-isting judicial documents.Therefore,the research is carried out from the following three aspects.Firstly,a multi-label hierarchical iterative annotation method(ML-HIA)is proposed,which can automatically annotate the original judicial documents and effec-tively improve the effect of the entity recognition task of judicial documents.Secondly,an feature mixed Transformer(FM-Trans-former)neural network model,which makes full use of the deep features of the inherent attributes of Chinese characters,is pro-posed to identify named entities of judicial documents.Finally,the proposed method and model are compared with other neural network models.The proposed method of text annotation can realize the task of judicial document annotation accurately.At the same time,compared with other models,the proposed model has a great improvement in the general dataset,and has achieved good results in the judicial datasets.

关键词

自然语言处理/数据标注/Transformer模型/深度学习/司法信息化

Key words

Natural language processing/Data annotation/Transformer model/Deep learning/Judicial informatization

引用本文复制引用

出版年

2024

计算机科学

重庆西南信息有限公司（原科技部西南信息中心）

计算机科学

CSTPCDCSCD北大核心

影响因子：0.944

ISSN：1002-137X

参考文献量50

段落导航