基于BERT-SELFATT-CNN模型的垃圾邮件分类方法

扫码查看

原文链接

万方数据
维普

中文摘要：针对传统垃圾邮件分类方法中使用静态词向量不能解决一词多义、长序列信息特征提取不足等问题,提出了一种基于BERT-SELFATT-CNN模型的垃圾邮件分类方法.使用动态文本表示方法BERT对邮件内容进行预训练,并生成带有上下语义信息的词向量,经过能够并行计算的自注意力机制层计算词与词之间的相似度去挖掘句子长距离信息,将生成的隐藏层向量输入到CNN网络提取向量局部特征.在中文垃圾邮件数据集上与已有模型进行对比实验,结果表明该模型在精确度、召回率和F1值上均有提高,模型训练速度也得到提升.

外文标题：A BERT-SELFATT-CNN model for spam classification

外文摘要：In order to solve the problems of polysemous words and insufficient feature extraction of long sequence information in traditional spam classification methods, a spam classification method based on BERT-SELFATT-CNN model was proposed. The dynamic text representation method BERT is used to pre-train the email content and generate word vectors with upper and lower semantic information. The similarity between words is calculated by self-attention mechanism layer which can be used for parallel computation to mine sentence long-distance information. The generated hidden layer vector is input into CNN network to extract local features of the vector. The experimental results show that the accuracy, recall rate and F1 value of the model are improved, and the training speed of the model is also improved.

外文关键词：

spamBERTself-attentional layerCNNtext classification

作者：

龚红仿、赵富荣、罗容容

展开 >

作者单位：

长沙理工大学数学与统计学院,湖南长沙,410114

关键词：

垃圾邮件 BERT 自注意力层 CNN 文本分类

基金：

国家自然科学基金湖南省自然科学基金湖南省教育厅重点项目

项目编号：

619720552021JJ3073418A145

出版年：

2024

DOI：

10.3969/j.issn.1672-6146.2024.02.003

湖南文理学院学报(自然科学版)

湖南文理学院

湖南文理学院学报(自然科学版)

CHSSCD

影响因子：0.274

ISSN：1672-6164

年,卷(期)：2024.36(2)