基于多任务联合学习的古代经典礼学文献礼俗专名自动识别方法研究

Multi-Task Learning for Ancient Ritual Literature Etiquette Entity Recognition

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：[目的]针对现有通用命名实体识别模型在古籍特定领域的局限性,提出一种多任务深度学习模型,专门用于多类型礼俗专名的自动识别,以提升古籍中复杂礼俗专名的识别精度和效率.[方法]首先构建包含6个类别的礼俗专名标注语料库,然后构建融合古文预训练语言模型的礼俗专名识别和自动标点一体化模型MJL-SikuRoBERTa-BiGRU-CRF.该模型利用SikuRoBERTa和BiGRU训练语料库并获取上下文语义信息,再由CRF层对两个子任务进行标签约束,生成全局最优的专名和标点标签序列.[结果]所提模型在礼俗专名识别任务上的F1值为84.34％,在自动标点任务上的F1值为75.30％.其中,在宫室、器物、服饰专名类别上效果显著,F1值达到85％以上;在饮食、车具、物产类别上表现稍显不足,F1值为76％～81％.[局限]模型未在更细粒度专名分类上进行验证.另外,本文尝试对专名识别方法进行数据增强,以提高礼俗专名识别效果,但并没有将其应用于所有类别.[结论]本文构建的一体化模型更适用于中国古代礼学文献的礼俗专名识别任务,可为古代礼仪信息抽取、知识库自动构建提供有效支持.

外文摘要：[Objective]This paper proposes a multi-task deep learning model tailored for ancient texts to overcome the limitations of current NER models,enhancing the identification of complex etiquette entity with improved accuracy and efficiency.[Methods]We built a named entity annotated corpus with six categories and employed a combined model,MJL-SikuRoBERTa-BiGRU-CRF.SikuRoBERTa and BiGRU extract contextual semantic information,while CRF imposes label constraints on both tasks,generating globally optimal named entity and punctuation label sequences.[Results]The proposed model has an F1 value of 84.34％on the etiquette recognition task and an F1 value of 75.30％on the automatic punctuation task.Among them,the palace,utensils,and costume moniker categories are effective with an F1 value of more than 85％,while the food,vehicle,and products categories are slightly underperformed with an F1 value of 76％～81％.[Limitations]The model did not validate finer-grained named entity classification,and the paper attempted to augment named entity recognition for cultural entities,but not for all categories.[Conclusions]The model constructed in this paper is more suitable for named entity recognition tasks in classical Chinese ritual texts and can effectively support information extraction and knowledge graph construction related to ancient rituals.

外文关键词：

Etiquette Entity RecognitionAncient Ritual LiteratureMulti-Task LearningPretrained Model for Classical Chinese Language

作者：

斯日古楞、林民、郭振东、张树钧、李斌、高颖杰

展开 >

作者单位：

内蒙古师范大学文学院呼和浩特 010022

内蒙古民族大学计算机科学与技术学院通辽 028043

内蒙古师范大学计算机科学技术学院呼和浩特 010022

海南大学计算机科学与技术学院海口 570228

展开 >

关键词：

专名识别古代礼学文献多任务学习古汉语预训练模型

基金：

国家自然科学基金项目内蒙古自治区高校科研项目内蒙古自治区直属高校基本科研业务费项目

项目编号：

62266033NJZY23101GXKY23Z018

出版年：

2024

DOI：

10.11925/infotech.2096-3467.2023.0372

数据分析与知识发现

中国科学院文献情报中心

数据分析与知识发现

CSTPCDCSSCICHSSCD北大核心EI

影响因子：1.452

ISSN：2096-3467

年,卷(期)：2024.8(7)