计算机应用与软件2024,Vol.41Issue(12) :173-181.DOI:10.3969/j.issn.1000-386x.2024.12.025

面向网络文章的质量检测模型

CONTENT QUALITY DETECTION MODEL FOR WEB ARTICLES

王凯楠 林欣欣 王薇
计算机应用与软件2024,Vol.41Issue(12) :173-181.DOI:10.3969/j.issn.1000-386x.2024.12.025

面向网络文章的质量检测模型

CONTENT QUALITY DETECTION MODEL FOR WEB ARTICLES

王凯楠 1林欣欣 1王薇2
扫码查看

作者信息

  • 1. 长春大学网络空间安全学院 吉林 长春 130022
  • 2. 长春大学计算机科学技术学院 吉林 长春 130022
  • 折叠

摘要

互联网中存在大量良莠不齐的文章,严重破坏网络生态,为构建绿色网络空间,网络文章质量检测是一项重要且崭新的工作.基于腾讯数据集,从文章组织特征、书写特征和语义特征三个维度对文章质量检测展开研究,构建了组织子网、特征子网和文本子网三个子网络,扩展了三种注意力模式和四种Transformer模式,其中采用CNN+BiGRU、Attention+ACNN、Transformer模型Ⅰ使三个子网络的分类准确率分别达到80.6%、87%和92.9%,并使三个子网的组合模型OFT模型框架的分类准确率达到 93.3%.此外,针对文本数据采用两种方式获取BERT词向量,最终OFT的准确率达到94.2%.实验结果表明,该模型效果优于现有模型.

Abstract

The existence of a large number of articles of mixed quality in the Internet has seriously damaged the network ecology.In order to build a green cyberspace,online article quality detection is an important and new task.Based on the Tencent dataset,we investigated article quality detection in three dimensions:article organization features,writing features and semantic features,and three sub-networks:organization sub-network,feature sub-network and text sub-network were built.Three attention models and four Transformer models were extended,in which CNN+BiGRU,Attention+ACNN,Transformer model I were used to make the classification accuracy of the three sub-networks reach 80.6%,87%,and 92.9%,respectively.The classification accuracy of the combined model OFT model framework of the three subnetworks reaches 93.3%.In addition,two methods were used to obtain BERT word vectors for text data,the final OFT's accuracy reaches 94.2%.The experimental results show that the proposed model outperforms the existing methods.

关键词

内容质量检测/四种Transformer模式/三种注意力模式/OFT模型框架

Key words

Content quality inspection/Four modes of transformer/Three modes of attention/OFT model framework

引用本文复制引用

出版年

2024
计算机应用与软件
上海市计算技术研究所 上海计算机软件技术开发中心

计算机应用与软件

CSTPCD北大核心
影响因子:0.615
ISSN:1000-386X
段落导航相关论文