Word-Character Model with Low Lexical Information Loss for Chinese NER
Chinese named entity recognition(CNER)task is a natural language processing technique that aims to recognize enti-ties with specific categories in text,such as names of people,places,organizations.It is a fundamental underlying task of natural language applications such as question and answer systems,machine translation,and information extraction.Since Chinese does not have a natural word separation structure like English,the effectiveness of word-based NER models for Chinese named entity recognition is significantly reduced by word separation errors,and character-based NER models ignore the role of lexical informa-tion.In recent years,many studies have attempted to incorporate lexical information into character-based models,and WC-LSTM has achieved significant improvements in model performance by injecting lexical information into the start and end characters of a word.However,this model still does not fully utilize lexical information,so based on it,LLL-WCM(word-character model with low lexical information loss)is proposed to incorporate lexical information for all intermediate characters of the lexicon to avoid lexical information loss.Meanwhile,two encoding strategies average and self-attention mechanism are introduced to extract all lexical information.Experiments are conducted on four Chinese datasets,and the results show that the F1 values of this method are improved by 1.89%,0.29%,1.10%and 1.54%,respectively,compared with WC-LSTM.
Named entity recognitionNatural language processingLexical information lossIntermediate charactersEncoding strategy