Objectives:To address the challenges of unclear boundaries and easily confused categories of entities in traditional Chinese medicine(TCM)case records,a combined entity recognition model based on rules,dictionaries,and conditional random fields(CRF)is proposed.Methods:A Chinese medicine terminology dictionary was constructed,the texts rules of medical case were analyzed,and the feature functions were constructed.Word segmentation of TCM medical records was performed by the jieba tool.Five categories of entities in medical records cases were manually labeled as training and validation sets to implement research on medical case entity recognition based on CRF.Finally,the CRF model was evaluated using accuracy,recall,and F1 value to investigate the impact of dictionaries,different entity categories,and text features on entity recognition results.Result:The F1 value of the model reached 83.5%,achieving good recognition performance.The addition of dictionaries has a significant promoting effect on entity recognition.The contextual features have the greatest impact on the recognition performance of the model.There are significant differences in the recognition results of entities of different categories,among which"prescription"has the best recognition effect,followed by"treatment"and"physical signs",and"syndrome type"and"symptom"have the worst recognition effect.Conclusion:This study provides an effective entity recognition model,which can greatly improve the accuracy of entity recognition in traditional Chinese medicine medical records and provide valuable references for future research.
关键词
中医医案/命名实体识别/中医术语词典/条件随机场/特征函数/中医药智能化
Key words
traditional Chinese medicine case records/named entity recognition/traditional Chinese medicine terminology dictionary/conditional random field/feature functions/intelligent traditional Chinese medicine