首页|基于汉字上下文信息增强词典知识融入的中文命名实体识别

基于汉字上下文信息增强词典知识融入的中文命名实体识别

扫码查看
由于中文语言缺少显式的分隔符,使得中文命名实体识别任务面临缺少词语边界信息的难题。为了解决这一问题,现有的主流模型通过引入词典来利用词语边界信息。然而,词典中的词语信息只是根据字词之间的匹配关系融入汉字表示中,忽视了句子信息对于词语选择的影响,与句子语义信息无关的词语不可避免地引入到模型中,使模型感知错误的词语边界信息。为了减少无关词语对于实体识别结果的影响,本文提出了一种新的中文命名实体识别方法ELKI,通过带有句子语义信息的汉字上下文表示来增强词典知识的融入,从而改善模型感知词语边界的精度。具体地,本文设计了一种新型的交叉注意力网络从词典中挖掘与语义信息相关的词语信息。同时,本文构造了一种门控融合网络来动态地将词典知识融入到汉字的上下文表示中。在Resume、MSRA和OntoNotes三个基准数据集上的实验结果表明本文方法优于其它的基线模型。
Chinese named entity recognition based on enhancing lexicon knowledge integration utilizing character context information
Chinese named entity recognition(NER)is a challenging task due to the lack of explicit delimiters in the Chinese language,which leads to the absence of word boundary information.Existing mainstream mod-els address this issue by introducing lexicon for Chinese NER,which provides word boundary information.However,the word information contained in lexicon is fused into the character representations according to the matching relation between characters and words,without considering the impact of sentence information on word selection.The results in the introduction of irrelevant words that are unrelated to sentence semantics,leading the model to incorrectly perceive word boundary information.To reduce the impact of irrelevant words on entity recognition results,this paper proposes a novel Chinese NER method,called ELKI,which integrates lexicon knowledge with character-context representations that capture sentence semantic informa-tion,thereby improving the accuracy of word boundary perception.Specifically,a novel relation-aware character-word cross-attention network is designed to mine word representation that is related to the semantic information from the lexicon.Then,a gated fusion network is constructed to dynamically fuse the lexicon knowledge representation of each character with its context representation.The proposed model is evaluated on three benchmark datasets,Resume,MSRA and OntoNotes,and it outperforms other baseline models.

Chinese named entity recognitionCross-attention networkGated fusion networkInformation extraction

赵振宇、朱静静、张宇馨、刘梦珠、陈黎、琚生根

展开 >

四川大学计算机学院,成都 610065

贵州商学院计算机与信息工程学院,贵阳 550014

中文命名实体识别 交叉注意力网络 门控融合网络 信息抽取

国家自然科学基金重点项目四川省重点研发项目

621370012023YFG0265

2024

四川大学学报(自然科学版)
四川大学

四川大学学报(自然科学版)

CSTPCD北大核心
影响因子:0.358
ISSN:0490-6756
年,卷(期):2024.61(4)
  • 2