命名实体识别是知识抽取中的重要任务之一,为了更有效地利用词典匹配信息,提出了基于匹配词权重优化的中文命名实体识别模型。首先利用与训练模型和分词工具获得每个字符的向量表示和词性标注;然后在词典中匹配潜在词组,跟据匹配词词频和文档计数的优化权重对词组加权,结合字符向量得到字符的多特征融合表示;最后使用双向长短期记忆网络(Bi-directional Long-Short Term Memory,Bi-LSTM)网络进行训练,使用条件随机场(Conditional Random Field,CRF)完成标签推理得到识别实体。试验结果表明,该模型在Resume和影视-音乐-书籍数据集上的F1值分别达到了95。55%和85。39%,有效地提高了中文命名实体识别效果。
Chinese Named Entity Recognition Method Based on Matching Word Weight Optimization
Named entity recognition is one of the important tasks in knowledge extraction.In order to make more effective use of lexicon information,a Chinese named entity recognition model based on the matching word weight optimization is proposed.First the training model and word segmentation tool is used to obtain the vector representation and part-of-speech tagging of each charac-ter,then the potential phrase is matched in the dictionary,the phrase is weighted according to the optimized weight of the matched word frequency and document count,and the character vector is combined to obtain the multi-characteristics of the character Fu-sion representation.Finally,a Bi-directional Long-Short Term Memory(Bi-LSTM)network is used for training,and a Conditional Random Field(CRF)is used to complete label inference to obtain the identified entity.The test results show that the F1 value of this model the on the Resume and Movie-Music-Book datasets reaches 95.55%and 85.39%,respectively,which effectively im-proves the effect of Chinese named entity recognition.
named entity recognitionrecurrent neural networkconditional random fielddictionary matchingweight opti-mization