针对110报警类警情文本数据存在着文本长度极短且样本类别分布严重不均衡的问题,提出一种BERT-BiGRU-WCELoss 警情分类模型。该模型通过中文预训练 BERT(Bidirectional Encoder Representations from Transformers)模型抽取文本的语义;使用BiGRU(Bidirectional Gated Recurrent Unit)综合提炼文本的语义特征;通过优化自适应权重损失函数WCELoss(Weight Cross Entropy Loss function)给少数类样本赋予更大的损失权重。实验结果表明:该模型在某市2015年某一自然月的110报警数据集上取得了95。83%的分类准确率,精准率、召回率、F1值和G_mean均高于传统深度学习模型和交叉熵损失训练的模型。
A BERT-BIGRU-WCELOSS CLASSIFICATION MODEL FOR HANDING SEVERELY UNBALANCED SHORT ALERT TEXT DATA
In response to the problem of extremely short text length and severely imbalanced distribution of sample categories in 110 alarm text data,this paper proposes a BERT-BiGRU-WCELoss alarm classification model.The model extracted the semantics of the text through the Chinese pre trained BERT(Bidirectional Encoder Representations from Transformers)model.BiGRU(Bidirectional Gated Recurrent Unit)was used to comprehensively extract the semantic features of the text.By optimizing the adaptive weight loss function WCELoss(Weight Cross Entropy Loss function),larger loss weights were assigned to minority class samples.The experimental results show that the model achieved a classification accuracy of 95.83%on the 110 alarm dataset of a certain natural month in 2015 in a certain city,with higher accuracy,recall rate,F1 value,and G_Mean than traditional deep learning models and models trained with cross entropy loss.
BERTBiGRUClassification of alarm textUnbalance dataShort textSample weighting