计算机应用与软件2024,Vol.41Issue(9) :217-223,229.DOI:10.3969/j.issn.1000-386x.2024.09.031

一种处理严重不均衡数据的BERT-BiGRU-WCELoss短文本警情分类模型

A BERT-BIGRU-WCELOSS CLASSIFICATION MODEL FOR HANDING SEVERELY UNBALANCED SHORT ALERT TEXT DATA

刘冬 翁海光 陈一民
计算机应用与软件2024,Vol.41Issue(9) :217-223,229.DOI:10.3969/j.issn.1000-386x.2024.09.031

一种处理严重不均衡数据的BERT-BiGRU-WCELoss短文本警情分类模型

A BERT-BIGRU-WCELOSS CLASSIFICATION MODEL FOR HANDING SEVERELY UNBALANCED SHORT ALERT TEXT DATA

刘冬 1翁海光 1陈一民2
扫码查看

作者信息

  • 1. 上海公安学院 上海 200137
  • 2. 上海建桥学院 上海 201306
  • 折叠

摘要

针对110报警类警情文本数据存在着文本长度极短且样本类别分布严重不均衡的问题,提出一种BERT-BiGRU-WCELoss 警情分类模型.该模型通过中文预训练 BERT(Bidirectional Encoder Representations from Transformers)模型抽取文本的语义;使用BiGRU(Bidirectional Gated Recurrent Unit)综合提炼文本的语义特征;通过优化自适应权重损失函数WCELoss(Weight Cross Entropy Loss function)给少数类样本赋予更大的损失权重.实验结果表明:该模型在某市2015年某一自然月的110报警数据集上取得了95.83%的分类准确率,精准率、召回率、F1值和G_mean均高于传统深度学习模型和交叉熵损失训练的模型.

Abstract

In response to the problem of extremely short text length and severely imbalanced distribution of sample categories in 110 alarm text data,this paper proposes a BERT-BiGRU-WCELoss alarm classification model.The model extracted the semantics of the text through the Chinese pre trained BERT(Bidirectional Encoder Representations from Transformers)model.BiGRU(Bidirectional Gated Recurrent Unit)was used to comprehensively extract the semantic features of the text.By optimizing the adaptive weight loss function WCELoss(Weight Cross Entropy Loss function),larger loss weights were assigned to minority class samples.The experimental results show that the model achieved a classification accuracy of 95.83%on the 110 alarm dataset of a certain natural month in 2015 in a certain city,with higher accuracy,recall rate,F1 value,and G_Mean than traditional deep learning models and models trained with cross entropy loss.

关键词

BERT/BiGRU/警情分类/非均衡数据/短文本/样本加权

Key words

BERT/BiGRU/Classification of alarm text/Unbalance data/Short text/Sample weighting

引用本文复制引用

基金项目

上海公安学院科研项目(23xkx53)

出版年

2024
计算机应用与软件
上海市计算技术研究所 上海计算机软件技术开发中心

计算机应用与软件

CSTPCD北大核心
影响因子:0.615
ISSN:1000-386X
段落导航相关论文