面向网络文章的质量检测模型
CONTENT QUALITY DETECTION MODEL FOR WEB ARTICLES
王凯楠 1林欣欣 1王薇2
作者信息
- 1. 长春大学网络空间安全学院 吉林 长春 130022
- 2. 长春大学计算机科学技术学院 吉林 长春 130022
- 折叠
摘要
互联网中存在大量良莠不齐的文章,严重破坏网络生态,为构建绿色网络空间,网络文章质量检测是一项重要且崭新的工作.基于腾讯数据集,从文章组织特征、书写特征和语义特征三个维度对文章质量检测展开研究,构建了组织子网、特征子网和文本子网三个子网络,扩展了三种注意力模式和四种Transformer模式,其中采用CNN+BiGRU、Attention+ACNN、Transformer模型Ⅰ使三个子网络的分类准确率分别达到80.6%、87%和92.9%,并使三个子网的组合模型OFT模型框架的分类准确率达到 93.3%.此外,针对文本数据采用两种方式获取BERT词向量,最终OFT的准确率达到94.2%.实验结果表明,该模型效果优于现有模型.
Abstract
The existence of a large number of articles of mixed quality in the Internet has seriously damaged the network ecology.In order to build a green cyberspace,online article quality detection is an important and new task.Based on the Tencent dataset,we investigated article quality detection in three dimensions:article organization features,writing features and semantic features,and three sub-networks:organization sub-network,feature sub-network and text sub-network were built.Three attention models and four Transformer models were extended,in which CNN+BiGRU,Attention+ACNN,Transformer model I were used to make the classification accuracy of the three sub-networks reach 80.6%,87%,and 92.9%,respectively.The classification accuracy of the combined model OFT model framework of the three subnetworks reaches 93.3%.In addition,two methods were used to obtain BERT word vectors for text data,the final OFT's accuracy reaches 94.2%.The experimental results show that the proposed model outperforms the existing methods.
关键词
内容质量检测/四种Transformer模式/三种注意力模式/OFT模型框架Key words
Content quality inspection/Four modes of transformer/Three modes of attention/OFT model framework引用本文复制引用
出版年
2024