高技术通讯2024,Vol.34Issue(2) :111-122.DOI:10.3772/j.issn.1002-0470.2024.02.001

利用类型语义表示进行标签降噪的细粒度实体分类

Fine grained entity type classification using type semantic representation for noisy label reduction

席鹏弼 靳小龙 白硕 程学旗
高技术通讯2024,Vol.34Issue(2) :111-122.DOI:10.3772/j.issn.1002-0470.2024.02.001

利用类型语义表示进行标签降噪的细粒度实体分类

Fine grained entity type classification using type semantic representation for noisy label reduction

席鹏弼 1靳小龙 1白硕 2程学旗1
扫码查看

作者信息

  • 1. 中国科学院计算技术研究所网络数据科学与技术重点实验室 北京 100190;中国科学院大学计算机科学与技术学院 北京 100408
  • 2. 恒生电子股份有限公司 杭州 310053
  • 折叠

摘要

细粒度实体分类(FET)任务的训练数据往往利用已有知识库中的知识通过远程监督方法进行生成,生成过程中不可避免地引入多余的噪音标签.现有考虑训练数据中噪音问题的工作通常只建模训练数据和标注类型的概率分布,对细粒度类型的语义信息学习不足,造成在标注了多个细粒度类型的训练数据上选择了与实体上下文不相关的类型进行模型的学习.本文提出一种利用细粒度类型的语义表示进行标签降噪的细粒度实体分类方法.首先利用训练数据中具有唯一细粒度类型路径的数据学习一部分细粒度类型的表示,进而结合细粒度类型间的关系信息学习其他细粒度类型的表示;其次在标注了细粒度类型的训练数据中选取与实体上下文的语义信息最相似的细粒度类型为目标类型,从数据集中选择Top-K个相似数据进行细粒度类型语义信息的聚合;最后在聚合信息上学习最终的细粒度实体分类模型.实验结果表明,该方法可以有效地从标注了细粒度类型的训练数据中选出与实体上下文的语义信息最相符的细粒度类型,达到提升细粒度实体分类准确率的效果.

Abstract

The training data of fine-grained entity typing(FET)is usually generated by the distant supervision based on knowledge base,this process inevitably introduces noise type labels.The existing work mostly models the probabili-ty distribution of the training data and annotation types,and lacks the semantic learning of fine-grained types,cau-sing the problem of the usage of types unrelated to the entity context during models learning.This paper proposes a fine-grained entity classification method for label noise reduction based on the semantic representation of fine-grained types.First,it learns the representation of some fine-grained types from the data with a unique fine-grained type path in the training set,and learns the representation of the rest fine-grained types by the combination of the relationship information between fine-grained types.Second,select the fine-grained entity type in the training data annotation fine-grained type set that is most similar to the semantic information of the entity context as target types,then,select Top-K similar sentences from the dataset to aggregate fine-grained semantic information.Last,it learns final fine-grained entity classification model based on the aggregated information.Experimental results and analysis on datasets demonstrate that our model effectively selects the fine-grained type that best matches the semantic infor-mation of the entity context from the fine-grained types set annotated in the training data,and is able to achieve the effect of improving the accuracy of fine-grained entity.

关键词

实体分类/细粒度类型/多标签降噪/多标签分类

Key words

entity typing/fine grained type/multi-label noise reduction/multi-labels classification

引用本文复制引用

基金项目

国家自然科学基金(U1911401)

国家自然科学基金(61772501)

国家自然科学基金(62002341)

国家自然科学基金(U1836206)

出版年

2024
高技术通讯
中国科学技术信息研究所

高技术通讯

CSTPCD北大核心
影响因子:0.19
ISSN:1002-0470
参考文献量29
段落导航相关论文