Named Entity Recognition of Traditional Chinese Medicine Ancient Records Based on Multi-feature Fusion
Purpose/Significance To construct a named entity corpus of traditional Chinese medicine(TCM)ancient records,and to improve the recognition accuracy and applicability of the general domain named entity recognition(NER)model in the field of TCM ancient records.Method/Process Annotation standards for entities in TCM ancient records are formulated,and 2 384 Xin'an medical records are annotated.A RoBERTa-BiLSTM-CRF model is developed,and word vectors with semantic features are generated using the RoBERTa pre-trained language model.The BiLSTM-CRF model is used to learn the global semantic features of sequences and decode and output the optimal label sequence.Dictionary and rule features are incorporated to enhance the model's capability to recognize entity boundaries and categories.Result/Conclusion The model shows a good recognition effect on the named entity corpus of Xin'an medical cases.Integration of domain terminology dictionaries and rule-based features improves the overall Fl score to 72.8%.
traditional Chinese medicine(TCM)ancient recordsnamed entity recognition(NER)corpusdictionarynatural language processing(NLP)