基于边界信息和词汇信息增强的中文命名实体识别

扫码查看

原文链接

万方数据

中文摘要：中文命名实体识别(named entity recognition,NER)是一种提取实体对的自然语言处理(natural language processing,NLP)技术,广泛应用于知识图构建和信息提取任务中.传统的中文NER方法主要强调字符信息的分析,而忽略了位置和单词特征等重要方面,阻碍了实体边界的准确识别.引入了一种增强的中文命名实体识别模型,该模型高度重视边界和单词信息,以实现实体边界的精确校准.首先,构建多层次文本特征作为模型的输入.然后,提出了融合位置信息和类别描述信息的策略,以增强语义表示能力.最后,使用条件随机场模型将增强的特征向量映射到序列标签输出,以准确提取所有实体和类别标签.模型在现有数据集OntoNotes、Resume和Weibo上,F1得分分别提高了 0.82％、0.78％和1.51％,验证了模型的有效性.

外文标题：Named Entity Recognition Based on Boundary Information and Word Information Enhancement

外文摘要：Chinese named entity recognition(NER)is a natural language processing(NLP)technology that extracts entity pairs,which is widely used in knowledge graph construction and information extraction tasks.The traditional Chinese NER method mainly emphasizes character-level analysis,but ignores important aspects such as location and word features,which hinders the accurate identification of entity boundaries.This paper introduces an enhanced Chinese NER model that places a heightened emphasis on both boundary and word information to enable the precise calibration of entity boundaries.Firstly,multi-level text features are constructed as the input of the model.Then,the strategy of integrating location information and category description information is proposed to enhance the semantic representation ability.Finally,the conditional random field(CRF)model is used to map the enhanced feature vector to the serialized label output to accurately extract all entity and category labels.The efficacy of the proposed model is underscored by empirical evidence,revealing advancements in the F1 score by increments of 0.82％,0.78％,and 1.51％on the existing datasets OntoNotes,Resum and Weibo,respectively.

外文关键词：

named entity recognitionlocation informationcategory description informationmulti-level text features

作者：

孙争艳、陈磊、魏苏波、陈宝国

展开 >

作者单位：

淮南师范学院计算机学院,安徽淮南 232038

上海大学计算机工程与科学学院,上海 200444

关键词：

命名实体识别位置信息类别描述信息多层次文本特征

出版年：

2024

DOI：

10.3969/j.issn.1672-1292.2024.04.008

南京师范大学学报(工程技术版)

南京师范大学

南京师范大学学报(工程技术版)

影响因子：0.313

ISSN：1672-1292

年,卷(期)：2024.24(4)