Named Entity Recognition(NER)is a fundamental task in building knowledge graphs and directly affects graph quality.However,in practice,mechanical failure data often contain a significant amount of domain-specific vocabulary,and in general,an imbalance exists in the distribution of entity types.Thus,existing NER methods in general domains do not yield satisfactory results.To address these problems,this paper proposes an entity recognition method that integrates a Focal Loss function into domain-specific dictionaries.This method improves the cross-entropy loss function by introducing balancing and modulation coefficients for sample distributions.In addition,entity recognition is enhanced through the fusion of vocabulary features.Experimental results on a self-built dataset of mining hoist machines show that the incorporation of Focal Loss increases the F1 value by 5.57 percentage points compared with the mainstream Bidirectional Encoder Representations from Transformers(BERT)-Bidirectional Long-Short-Term Memory(BiLSTM)-Conditional Random Field(CRF)model.Furthermore,it outperforms the typical Synthetic Minority Over-sampling Technique(SMOTE)method in solving imbalanced data issues.By incorporating domain dictionaries,the F1 value is further improved,reaching 89.13%.
关键词
命名实体识别/不平衡数据/焦点损失函数/机械设备故障/双向长短期记忆网络/条件随机场
Key words
Named Entity Recognition(NER)/imbalanced data/Focal Loss function/mechanical equipment failure/Bi-directional Long Short-Term Memory(BiLSTM)network/Conditional Random Field(CRF)