首页|基于跨度与类别增强的中文新闻命名实体识别

基于跨度与类别增强的中文新闻命名实体识别

扫码查看
在新闻领域,识别命名实体涉及复杂的语法结构和长名称,这为确定实体边界带来了挑战,同时也引发了序列标注方法在预测长实体时的提前中断问题。为了应对这些挑战,提出了一种基于跨度与类别增强的中文新闻命名实体识别模型——SpaCE。该模型基于Transformer结构的双向编码器表示预训练模型(BERT),通过跨度预测和类别描述增强,提升了识别性能。在编码新闻文本信息的过程中,模型结合类别描述以增强语义知识,并采用基于跨度的解码方式来解决长实体预测中断问题。另外,通过精确标记的方法引入词边界信息,并优化实体匹配策略,有效减少了由跨度解码引起的非实体匹配情形。与基线模型相比,SpaCE在3个数据集上的性能均有所提升。另外,在无序文本上,SpaCE仍表现出了较强的命名实体识别能力,具有很好的鲁棒性。
Named entity recognition based on span and category enhancement for Chinese news
In the field of news,the identification of named entities is complicated by complex syntactic structures and long entity names,which pose challenges for determining entity boundaries and lead to interruptions in predicting long en-tities using sequence labeling methods.To address these challenges,a model named SpaCE(span and category enhance-ment for Chinese news named entity recognition)was proposed.This model was developed based on the bidirectional en-coder representation pre-trained model with a Transformer structure(BERT)and was enhanced by span prediction and category description to improve recognition performance.During the encoding of news text information,category descrip-tions were incorporated to enhance semantic knowledge,and a span-based decoding method was adopted to address inter-ruptions in predicting long entities.Furthermore,word boundary information was introduced through precise labeling,and the entity matching strategy was optimized,effectively reducing non-entity matching caused by span decoding.Com-pared to baseline models,SpaCE demonstrated improved performance on three datasets.Furthermore,SpaCE exhibits strong named entity recognition capabilities on disordered texts,indicating its robustness.

news named entity recognitionBERTspancategory enhancementword boundary information

祁瑞艳、李龙杰、徐世琤、马笠恭、马志新

展开 >

兰州大学信息科学与工程学院,甘肃 兰州 730000

兰州市公安局,甘肃 兰州 730000

新闻命名实体识别 BERT 跨度 类别增强 词边界信息

2024

智能科学与技术学报

智能科学与技术学报

CSTPCD
ISSN:
年,卷(期):2024.6(4)