Fault Classification and Diagnosis Method for CBTC On-board Signal Equipment Based on Doc2vec-LightGBM
The on-board equipment of communication based train control system(CBTC)is an important part of the ur-ban rail transit signal system.During its operation,a large amount of discrete and fragmented log text data will be gener-ated.At present,problems such as unclear semantics and redundant words in the fault record text of CBTC on-board e-quipment cause difficulty to trace the cause of the fault.In response to this,this paper proposed an automatic classifica-tion and diagnosis method for CBTC on-board equipment faults based on doc2vec-LightGBM.Firstly,based on the use of Jieba to complete text segmentation for fault text,feature extraction of segmented text data was realized according to TF-IDF algorithm,followed by the use of Doc2vec to train the text segmentation vector.Secondly,because of the problem of unbalanced data,the Borderline-SMOTE algorithm was used for the completion and generalization of small category text vector data.Finally,automatic classification of fault text was completed by training Lightgbm classifier.A total of 1 133 pieces of fault text data recorded by a signal manufacturer were used for classification experimental analysis,and com-pared with the support vector machine(SVM)method.The experimental results show that the classification accuracy and recall of the proposed method are 98.2%and 97.5%respectively,proving the effectiveness and superiority of the auto-matic fault text classification method.