Can Phonetics and Orthography Effectively Enhance Chinese Character Representation?
[Objective]This study aims to investigate the effectiveness of using phonetics and orthography features to enhance the representation of Chinese characters.[Methods]Based on the Named Entity Recognition(NER)task,we used a general embedding module,a bidirectional LSTM module,and a fully connected network with Softmax activation as the benchmark embedding layer,context encoding and decoding layers.Then,we compared the changes in Micro-F 1 scores and entity-specific F1 scores after enhancing character embeddings with Chinese pinyin,images,Wubi input codes,Four-Corner codes,Cangjie codes,and radicals,using datasets such as MSRA,PeopleDaily,CCKS2017,Resume,and E-Commerce.[Results]Using phonetic and orthographic enhanced embeddings led to a performance decrease of nearly 0.01 in the MSRA and PeopleDaily datasets.At the same time,there was no statistically significant change in performance in the CCKS2017,Resume,and E-Commerce datasets.[Limitations]Using only 32x32 pixels images of Chinese simplified characters may affect the extraction of orthographic features.[Conclusions]While phonetic and orthographic features can enhance the representation of Chinese characters,they also introduce noise.They lead to varying impacts on model performance across different corpora and entities.