首页|基于BERT-SELFATT-CNN模型的垃圾邮件分类方法

基于BERT-SELFATT-CNN模型的垃圾邮件分类方法

扫码查看
针对传统垃圾邮件分类方法中使用静态词向量不能解决一词多义、长序列信息特征提取不足等问题,提出了一种基于BERT-SELFATT-CNN模型的垃圾邮件分类方法.使用动态文本表示方法BERT对邮件内容进行预训练,并生成带有上下语义信息的词向量,经过能够并行计算的自注意力机制层计算词与词之间的相似度去挖掘句子长距离信息,将生成的隐藏层向量输入到CNN网络提取向量局部特征.在中文垃圾邮件数据集上与已有模型进行对比实验,结果表明该模型在精确度、召回率和F1值上均有提高,模型训练速度也得到提升.
A BERT-SELFATT-CNN model for spam classification
In order to solve the problems of polysemous words and insufficient feature extraction of long sequence information in traditional spam classification methods, a spam classification method based on BERT-SELFATT-CNN model was proposed. The dynamic text representation method BERT is used to pre-train the email content and generate word vectors with upper and lower semantic information. The similarity between words is calculated by self-attention mechanism layer which can be used for parallel computation to mine sentence long-distance information. The generated hidden layer vector is input into CNN network to extract local features of the vector. The experimental results show that the accuracy, recall rate and F1 value of the model are improved, and the training speed of the model is also improved.

spamBERTself-attentional layerCNNtext classification

龚红仿、赵富荣、罗容容

展开 >

长沙理工大学 数学与统计学院,湖南 长沙,410114

垃圾邮件 BERT 自注意力层 CNN 文本分类

国家自然科学基金湖南省自然科学基金湖南省教育厅重点项目

619720552021JJ3073418A145

2024

湖南文理学院学报(自然科学版)
湖南文理学院

湖南文理学院学报(自然科学版)

CHSSCD
影响因子:0.274
ISSN:1672-6164
年,卷(期):2024.36(2)