Chinese named entity recognition integrating Chinese character glyph structure information
This paper presents an integration model of BCBGAC,to improve the recognition accuracy of Chinese named entities by integrating the glyph structure information of Chinese characters in Chinese character coding.BCBGAC uses Wubi method to decompose Chinese characters into basic root components in writing order.The root components are encoded by Skip-Gram method.And the encodings of root components are input into CNN to generate the glyph structure vector of Chinese character by extracting the glyph structure features in the Chinese character.The glyph structure vector is spliced with the basic vector of Chinese character generated by BERT model to obtain the final Chinese character embedding vector.Then the Chinese character vectors are input into BiGRU network to capture the context-dependent relationship among the vectors.Attention mechanism is introduced to weight the vectors.A CRF decoding layer is used to obtain the best annotation of the entity sequence based on the Chinese characters vectors.Experimental results on two datasets show that BCBGAC model achieves better entity identification effect than the baseline model.The F1 value reaches 96.06% and 95.48% on the two datasets respectively,which verified the effectiveness of BCBGAC model in Chinese named entity recognition task.
Chinese named entity recognitionGlyph structure embeddingBiGRUattention mechanism