利用类型语义表示进行标签降噪的细粒度实体分类

扫码查看

原文链接

万方数据
维普

中文摘要：细粒度实体分类(FET)任务的训练数据往往利用已有知识库中的知识通过远程监督方法进行生成,生成过程中不可避免地引入多余的噪音标签.现有考虑训练数据中噪音问题的工作通常只建模训练数据和标注类型的概率分布,对细粒度类型的语义信息学习不足,造成在标注了多个细粒度类型的训练数据上选择了与实体上下文不相关的类型进行模型的学习.本文提出一种利用细粒度类型的语义表示进行标签降噪的细粒度实体分类方法.首先利用训练数据中具有唯一细粒度类型路径的数据学习一部分细粒度类型的表示,进而结合细粒度类型间的关系信息学习其他细粒度类型的表示;其次在标注了细粒度类型的训练数据中选取与实体上下文的语义信息最相似的细粒度类型为目标类型,从数据集中选择Top-K个相似数据进行细粒度类型语义信息的聚合;最后在聚合信息上学习最终的细粒度实体分类模型.实验结果表明,该方法可以有效地从标注了细粒度类型的训练数据中选出与实体上下文的语义信息最相符的细粒度类型,达到提升细粒度实体分类准确率的效果.

外文标题：Fine grained entity type classification using type semantic representation for noisy label reduction

外文摘要：The training data of fine-grained entity typing(FET)is usually generated by the distant supervision based on knowledge base,this process inevitably introduces noise type labels.The existing work mostly models the probabili-ty distribution of the training data and annotation types,and lacks the semantic learning of fine-grained types,cau-sing the problem of the usage of types unrelated to the entity context during models learning.This paper proposes a fine-grained entity classification method for label noise reduction based on the semantic representation of fine-grained types.First,it learns the representation of some fine-grained types from the data with a unique fine-grained type path in the training set,and learns the representation of the rest fine-grained types by the combination of the relationship information between fine-grained types.Second,select the fine-grained entity type in the training data annotation fine-grained type set that is most similar to the semantic information of the entity context as target types,then,select Top-K similar sentences from the dataset to aggregate fine-grained semantic information.Last,it learns final fine-grained entity classification model based on the aggregated information.Experimental results and analysis on datasets demonstrate that our model effectively selects the fine-grained type that best matches the semantic infor-mation of the entity context from the fine-grained types set annotated in the training data,and is able to achieve the effect of improving the accuracy of fine-grained entity.

外文关键词：

entity typingfine grained typemulti-label noise reductionmulti-labels classification

作者：

席鹏弼、靳小龙、白硕、程学旗

展开 >

作者单位：

中国科学院计算技术研究所网络数据科学与技术重点实验室北京 100190

中国科学院大学计算机科学与技术学院北京 100408

恒生电子股份有限公司杭州 310053

关键词：

实体分类细粒度类型多标签降噪多标签分类

基金：

国家自然科学基金国家自然科学基金国家自然科学基金国家自然科学基金

项目编号：

U19114016177250162002341U1836206

出版年：

2024

DOI：

10.3772/j.issn.1002-0470.2024.02.001

高技术通讯

中国科学技术信息研究所

高技术通讯

CSTPCD北大核心

影响因子：0.19

ISSN：1002-0470

年,卷(期)：2024.34(2)

参考文献量29