Named Entity Recognition of Mechanical Equipment Failure for Imbalanced Data
Named Entity Recognition(NER)is a fundamental task in building knowledge graphs and directly affects graph quality.However,in practice,mechanical failure data often contain a significant amount of domain-specific vocabulary,and in general,an imbalance exists in the distribution of entity types.Thus,existing NER methods in general domains do not yield satisfactory results.To address these problems,this paper proposes an entity recognition method that integrates a Focal Loss function into domain-specific dictionaries.This method improves the cross-entropy loss function by introducing balancing and modulation coefficients for sample distributions.In addition,entity recognition is enhanced through the fusion of vocabulary features.Experimental results on a self-built dataset of mining hoist machines show that the incorporation of Focal Loss increases the F1 value by 5.57 percentage points compared with the mainstream Bidirectional Encoder Representations from Transformers(BERT)-Bidirectional Long-Short-Term Memory(BiLSTM)-Conditional Random Field(CRF)model.Furthermore,it outperforms the typical Synthetic Minority Over-sampling Technique(SMOTE)method in solving imbalanced data issues.By incorporating domain dictionaries,the F1 value is further improved,reaching 89.13%.
Named Entity Recognition(NER)imbalanced dataFocal Loss functionmechanical equipment failureBi-directional Long Short-Term Memory(BiLSTM)networkConditional Random Field(CRF)