Research on entity extraction of TCM medical cases based on terminology dictionary
Objective To extract six categories of entities including symptoms,etiology and pathogenesis,treatment principles,medication,prescriptions,and acupoint selection from TCM medical cases,so as to lay the foundation for the construction of TCM medical case knowledge graphs and intelligent assistance in TCM diagnosis and treatment.Methods Based on the characteristics of TCM medical case texts,a dynamically updatable terminology dictionary method was proposed for word segmentation,and its effectiveness was validated on medical cases of TCM neurological disorders,as well as three publicly available datasets:ChineseBLUE/cEHRNER,ChineseBLUE/cMedQANER,and CBLUE/CMeEE.Results The model using the terminology dictionary achieved higher accuracy,precision,recall,and F1 values compared to the model without using the terminology dictionary.The F1 values on the test set and validation set were 92.07%and 93.04%,respectively.Conclusion The model integrating the dynamically updatable terminology dictionary segmentation method can enhance the recognition ability of specific terms and new entities in the TCM field,improve the accuracy of key information identification in TCM medical cases,and promote the inheritance and development of TCM knowledge.
medical cases of Chinese medicineneurological disordersterminology dictionaryentity extractionIDCNN-CRF model