融合实体信息的古汉语关系分类研究

扫码查看

原文链接

万方数据
维普

中文摘要：[目的]将实体信息与预训练语言模型结合应用到古汉语关系分类任务中,构建古汉语关系分类模型.[方法]首先,在预训练模型输入层中使用特殊标记标出实体对的位置,同时在原关系句之后拼接实体类型描述句;其次,在预练语言模型的输出中进一步提取实体语义信息;然后,通过CNN将每个字符相对于首尾实体的位置信息融入模型中;最后,将句表示、实体语义表示以及CNN输出拼接经过分类器得到关系标签.[结果]相较于仅使用预训练语言模型,本文模型在MacroF1指标上平均有3.5个百分点的提升.[局限]通过分析混淆矩阵发现本文模型在有相同实体类型组合的关系上容易出现预测错误.[结论]在预训练语言模型中结合实体信息能够提高古汉语关系分类的效果,且实验结果证明本文融合实体信息的方法是有效的.

外文标题：Classifying Ancient Chinese Text Relations with Entity Information

外文摘要：[Objective]This paper integrates entity information with pre-trained language models,which help us classify ancient Chinese relations.[Methods]Firstly,we utilized special tokens in the input layer of the pre-trained model to mark the positions of entity pairs.We also appended entity-type descriptions following the original relation sentences.Secondly,we extracted semantic information of entities from the output of the pre-trained language model.Thirdly,we employed a CNN model to incorporate positional information of each token relative to the start and end entities into the model.Finally,we concatenated sentence representations,entity semantic representations,and CNN outputs and passed them through a classifier to obtain relation labels.[Results]Compared to pre-trained language models,our new model's Macro Fl score was 3.5％higher on average.[Limitations]Analysis of the confusion matrix reveals a tendency for errors in predicting relations with the same entity type pairs.[Conclusions]Combining entity information and pre-trained language models enhances the effectiveness of ancient Chinese relation classification.

外文关键词：

Ancient ChineseRelation ExtractionRelation ClassificationPre-trained Language ModelEntity Information

作者：

唐雪梅、苏祺、王军

展开 >

作者单位：

北京大学信息管理系北京 100871

北京大学数字人文研究中心北京 100871

北京大学外国语学院北京 100871

关键词：

古汉语关系抽取关系分类预训练语言模型实体信息

基金：

国家自然科学基金国际重点合作项目

项目编号：

72010107003

出版年：

2024

DOI：

10.11925/infotech.2096-3467.2022.1367

数据分析与知识发现

中国科学院文献情报中心

数据分析与知识发现

CSTPCDCSSCICHSSCD北大核心EI

影响因子：1.452

ISSN：2096-3467

年,卷(期)：2024.8(1)

参考文献量34