中文命名实体识别(named entity recognition,NER)是一种提取实体对的自然语言处理(natural language processing,NLP)技术,广泛应用于知识图构建和信息提取任务中.传统的中文NER方法主要强调字符信息的分析,而忽略了位置和单词特征等重要方面,阻碍了实体边界的准确识别.引入了一种增强的中文命名实体识别模型,该模型高度重视边界和单词信息,以实现实体边界的精确校准.首先,构建多层次文本特征作为模型的输入.然后,提出了融合位置信息和类别描述信息的策略,以增强语义表示能力.最后,使用条件随机场模型将增强的特征向量映射到序列标签输出,以准确提取所有实体和类别标签.模型在现有数据集OntoNotes、Resume和Weibo上,F1得分分别提高了 0.82%、0.78%和1.51%,验证了模型的有效性.
Named Entity Recognition Based on Boundary Information and Word Information Enhancement
Chinese named entity recognition(NER)is a natural language processing(NLP)technology that extracts entity pairs,which is widely used in knowledge graph construction and information extraction tasks.The traditional Chinese NER method mainly emphasizes character-level analysis,but ignores important aspects such as location and word features,which hinders the accurate identification of entity boundaries.This paper introduces an enhanced Chinese NER model that places a heightened emphasis on both boundary and word information to enable the precise calibration of entity boundaries.Firstly,multi-level text features are constructed as the input of the model.Then,the strategy of integrating location information and category description information is proposed to enhance the semantic representation ability.Finally,the conditional random field(CRF)model is used to map the enhanced feature vector to the serialized label output to accurately extract all entity and category labels.The efficacy of the proposed model is underscored by empirical evidence,revealing advancements in the F1 score by increments of 0.82%,0.78%,and 1.51%on the existing datasets OntoNotes,Resum and Weibo,respectively.
named entity recognitionlocation informationcategory description informationmulti-level text features