当前推特等国外社交平台,已成为从事网络黑灰产犯罪不可或缺的工具,对推特上黑灰产账号进行发现、检测和分类对于打击网络犯罪、维护社会稳定具有重大意义。现有的推文分类模型双向长短时记忆网络(bi-directional long short-term memory,BiLSTM)可以学习推文的上下文信息,却无法学习局部关键信息,卷积神经网络(convolu-tion neural network,CNN)模型可以学习推文的局部关键信息,却无法学习推文的上下文信息。结合BiLSTM与CNN两种模型的优势,提出了 BiLSTM-CNN推文分类模型,该模型将推文进行向量化后,输入BiLSTM模型学习推文的上下文信息,再在BiLSTM模型后引入CNN层,进行局部特征的提取,最后使用全连接层将经过池化的特征连接在一起,并应用softmax函数进行四分类。模型在自主构建的中文推特黑灰产推文数据集上进行实验,并使用TextCNN、TextRNN、TextRCNN三种分类模型作为对比实验,实验结果显示,所提的BiLSTM-CNN推文分类模型在对四类推文进行分类的宏准确率为98。32%,明显高于TextCNN、TextRNN和TextRCNN三种模型的准确率。
Twitter Black Market Accounts Classification Model Incorporating BiLSTM and CNN
Currently,foreign social platforms,such as Twitter,have become indispensable tools for engaging in cyber black and gray crime,and the discovery,detection and classification of black and gray accounts on Twitter are of great significance for combating cyber crime and maintaining social stability.The existing tweet classification model bidirec-tional long short-term memory(BiLSTM)can learn the contextual information of tweets but cannot learn the local key information,and the convolution neural network(CNN)model can learn the local key information of tweets but cannot learn the contextual information of tweets.This paper combines the advantages of BiLSTM and CNN models and proposes BiLSTM-CNN tweet classification model,which vectorizes the tweets,inputs them into BiLSTM model to learn the contextual information of the tweets,and then introduces a CNN layer after the BiLSTM model for the extraction of local features,and finally uses a fully connected layer to connect the pooled features together,and applies the softmax function for quadruple classification.The model is experimented on the independently constructed Chinese Twitter black and gray tweets dataset,and three classification models,TextCNN,TextRNN,and TextRCNN,are used as the comparison experiments,and the experimental results show that the proposed BiLSTM-CNN tweets classification model of this paper has a macro-accuracy of 98.32%in classifying the four types of tweets,which is significantly higher than that of TextCNN,TextRNN and TextRCNN three models'accuracy.
text classificationbidirectional long short-term memory(BiLSTM)convolutional neural network(CNN)black marketTwitter