一种处理严重不均衡数据的BERT-BiGRU-WCELoss短文本警情分类模型

扫码查看

原文链接

万方数据
维普

中文摘要：针对110报警类警情文本数据存在着文本长度极短且样本类别分布严重不均衡的问题,提出一种BERT-BiGRU-WCELoss 警情分类模型.该模型通过中文预训练 BERT(Bidirectional Encoder Representations from Transformers)模型抽取文本的语义;使用BiGRU(Bidirectional Gated Recurrent Unit)综合提炼文本的语义特征;通过优化自适应权重损失函数WCELoss(Weight Cross Entropy Loss function)给少数类样本赋予更大的损失权重.实验结果表明:该模型在某市2015年某一自然月的110报警数据集上取得了95.83％的分类准确率,精准率、召回率、F1值和G_mean均高于传统深度学习模型和交叉熵损失训练的模型.

外文标题：A BERT-BIGRU-WCELOSS CLASSIFICATION MODEL FOR HANDING SEVERELY UNBALANCED SHORT ALERT TEXT DATA

外文摘要：In response to the problem of extremely short text length and severely imbalanced distribution of sample categories in 110 alarm text data,this paper proposes a BERT-BiGRU-WCELoss alarm classification model.The model extracted the semantics of the text through the Chinese pre trained BERT(Bidirectional Encoder Representations from Transformers)model.BiGRU(Bidirectional Gated Recurrent Unit)was used to comprehensively extract the semantic features of the text.By optimizing the adaptive weight loss function WCELoss(Weight Cross Entropy Loss function),larger loss weights were assigned to minority class samples.The experimental results show that the model achieved a classification accuracy of 95.83％on the 110 alarm dataset of a certain natural month in 2015 in a certain city,with higher accuracy,recall rate,F1 value,and G_Mean than traditional deep learning models and models trained with cross entropy loss.

外文关键词：

BERTBiGRUClassification of alarm textUnbalance dataShort textSample weighting

作者：

刘冬、翁海光、陈一民

展开 >

作者单位：

上海公安学院上海 200137

上海建桥学院上海 201306

关键词：

BERT BiGRU 警情分类非均衡数据短文本样本加权

基金：

上海公安学院科研项目

项目编号：

23xkx53

出版年：

2024

DOI：

10.3969/j.issn.1000-386x.2024.09.031

计算机应用与软件

上海市计算技术研究所上海计算机软件技术开发中心

计算机应用与软件

CSTPCD北大核心

影响因子：0.615

ISSN：1000-386X

年,卷(期)：2024.41(9)