基于匹配词权重优化的中文命名实体识别方法

扫码查看

原文链接

国家科技期刊平台
NETL
NSTL
万方数据

中文摘要：命名实体识别是知识抽取中的重要任务之一,为了更有效地利用词典匹配信息,提出了基于匹配词权重优化的中文命名实体识别模型.首先利用与训练模型和分词工具获得每个字符的向量表示和词性标注;然后在词典中匹配潜在词组,跟据匹配词词频和文档计数的优化权重对词组加权,结合字符向量得到字符的多特征融合表示;最后使用双向长短期记忆网络(Bi-directional Long-Short Term Memory,Bi-LSTM)网络进行训练,使用条件随机场(Conditional Random Field,CRF)完成标签推理得到识别实体.试验结果表明,该模型在Resume和影视-音乐-书籍数据集上的F1值分别达到了95.55%和85.39%,有效地提高了中文命名实体识别效果.

外文标题：Chinese Named Entity Recognition Method Based on Matching Word Weight Optimization

外文摘要：Named entity recognition is one of the important tasks in knowledge extraction.In order to make more effective use of lexicon information,a Chinese named entity recognition model based on the matching word weight optimization is proposed.First the training model and word segmentation tool is used to obtain the vector representation and part-of-speech tagging of each charac-ter,then the potential phrase is matched in the dictionary,the phrase is weighted according to the optimized weight of the matched word frequency and document count,and the character vector is combined to obtain the multi-characteristics of the character Fu-sion representation.Finally,a Bi-directional Long-Short Term Memory(Bi-LSTM)network is used for training,and a Conditional Random Field(CRF)is used to complete label inference to obtain the identified entity.The test results show that the F1 value of this model the on the Resume and Movie-Music-Book datasets reaches 95.55%and 85.39%,respectively,which effectively im-proves the effect of Chinese named entity recognition.

外文关键词：

named entity recognitionrecurrent neural networkconditional random fielddictionary matchingweight opti-mization

作者：

戴高阳、孟小艳、张容祯、陈燕红、汪洋

展开 >

作者单位：

新疆农业大学计算机与信息工程学院乌鲁木齐 830052

关键词：

命名实体识别循环神经网络条件随机场词典匹配权重优化

基金：

新疆维吾尔自治区自然科学基金项目新疆维吾尔自治区重点研发项目

项目编号：

2019D01A502017B01006-1

出版年：

2024

DOI：

10.3969/j.issn.1672-9722.2024.02.041

计算机与数字工程

中国船舶重工集团公司第七0九研究所

计算机与数字工程

CSTPCD

影响因子：0.355

ISSN：1672-9722

年,卷(期)：2024.52(2)

参考文献量17