计算机工程与科学2024,Vol.46Issue(8) :1473-1481.DOI:10.3969/j.issn.1007-130X.2024.08.016

一种基于多特征融合嵌入的中文命名实体识别模型研究

A Chinese named entity recognition model based on multi-feature fusion embedding

刘晓华 徐茹枝 杨成月
计算机工程与科学2024,Vol.46Issue(8) :1473-1481.DOI:10.3969/j.issn.1007-130X.2024.08.016

一种基于多特征融合嵌入的中文命名实体识别模型研究

A Chinese named entity recognition model based on multi-feature fusion embedding

刘晓华 1徐茹枝 1杨成月2
扫码查看

作者信息

  • 1. 华北电力大学控制与计算机工程学院,北京 102206
  • 2. 国家电网有限公司大数据中心,北京 100052
  • 折叠

摘要

为解决中文字形上存在差异以及中文词语边界模糊的问题,提出了一种多特征融合嵌入的中文命名实体识别模型.在提取语义特征的基础上,基于卷积神经网络和多头自注意力机制捕获字形特征,并参考词语向量嵌入表获取词语特征,同时利用双向长短期记忆神经网络学习长距离的上下文表示,最后结合条件随机场学习句子序列标签中的约束条件,实现中文命名实体识别.在Resume、Weibo和People Daily数据集上的F1值分别达到了96.66%,70.84%和96.15%,证明提出的模型有效地提高了中文命名实体识别任务的性能.

Abstract

In order to solve the problems of differences in Chinese glyphs and blurred boundaries of Chinese words,a Chinese named entity recognition model based on multi-feature fusion embedding is proposed.On the basis of extracting semantic features,glyph features are captured based on convolu-tional neural network and multi-headed self-attention mechanism,word features are obtained with refer-ence to the word vector embedding table,and the bidirectional long short-term memory neural network is used to learn the context representation of long distance.Finally the constraint conditions in sentence sequence labels are learned by combining the conditional random field to realize Chinese named entity recognition.The Fl values on the Resume,Weibo and People Daily datasets reach 96.66%,70.84%and 96.15%,respectively,which proves that the proposed model effectively improves the performance of Chinese named entity recognition tasks.

关键词

命名实体识别/特征融合/多头自注意力机制

Key words

named entity recognition/feature fusion/multi-headed self-attention mechanism

引用本文复制引用

基金项目

国家自然科学基金(61972148)

出版年

2024
计算机工程与科学
国防科学技术大学计算机学院

计算机工程与科学

CSTPCD北大核心
影响因子:0.787
ISSN:1007-130X
段落导航相关论文