融合词典信息和句子语义的中文命名实体识别

Chinese Named Entity Recognition with Fusion of Lexicon Information and Sentence Semantics

王谭 ¹陈金广 ¹马丽丽¹

扫码查看

作者信息

1. 西安工程大学计算机科学学院,陕西西安 710048
折叠

摘要

受益于深度学习技术的蓬勃发展,命名实体识别任务的性能也得到了进一步的提升.然而,基于深度学习网络的模型的优秀性能严重依赖于大量的标注样本的支持,在缺少标注样本的小数据集上难以充分挖掘深层次信息,导致识别效果不佳.基于以上问题,本文提出一种融合词典信息和句子语义的中文命名实体识别模型LS-NER.首先,将字符在词典中匹配到的潜在词作为先验词汇信息供模型学习,应对中文分词问题.然后,将用于计算文本相似度的带有语义信息的句子嵌入并应用到命名实体识别任务中,帮助模型从相似的句子中寻找相似实体.最后,设计基于注意力机制思想的特征融合方式,使模型能够充分学习句子嵌入带来的语义信息.实验结果表明,本文模型在小数据集Resume和Weibo上应用均达到了不错的性能,在未增加其他外部信息的前提下,句子语义能帮助模型学习到更深层次的特征,比未添加句子信息的模型的F1分数分别高出0.15个百分点和2.26个百分点.

Abstract

The performance of named entity recognition tasks has significantly improved due to the rapid advancement of deep learning techniques.However,the outstanding results achieved by deep learning networks often rely on large amounts of labeled samples,making it challenging to fully exploit deep information in small datasets.In this paper,we propose a Chinese named en-tity recognition model(LS-NER)that combines lexicon and sentence semantics.Firstly,potential words matched by characters in the dictionary serve as a priori lexical information for the model,addressing the Chinese word segmentation issue.Then,sen-tence embeddings containing semantic information,typically used for calculating text similarity,are applied to the named entity recognition task,enabling the model to identify similar entities from analogous sentences.Finally,a feature fusion strategy is de-vised to allow the model to effectively learn the semantic information provided by sentence embeddings.The experimental results demonstrate that our approach achieves commendable performance on both small datasets Resume and Weibo.The incorporation of sentence semantics assists the model in learning deeper features without requiring additional external information,resulting in F1 scores that are 0.15 percentage points and 2.26 percentage points higher than those of the model without added sentence infor-mation,respectively.

关键词

命名实体识别/BERT/SoftLexicon/Sentence-Bert/条件随机场

Key words

named entity recognition/BERT/SoftLexicon/Sentence-Bert/CRF

引用本文复制引用

基金项目

陕西省自然科学基础研究计划(2023-JC-YB-568)

陕西省教育厅科研项目(22JP028)

出版年

2024

计算机与现代化

江西省计算机学会江西省计算技术研究所

计算机与现代化

CSTPCD

影响因子：0.472

ISSN：1006-2475

参考文献量27

段落导航