The existence of a large number of articles of mixed quality in the Internet has seriously damaged the network ecology.In order to build a green cyberspace,online article quality detection is an important and new task.Based on the Tencent dataset,we investigated article quality detection in three dimensions:article organization features,writing features and semantic features,and three sub-networks:organization sub-network,feature sub-network and text sub-network were built.Three attention models and four Transformer models were extended,in which CNN+BiGRU,Attention+ACNN,Transformer model I were used to make the classification accuracy of the three sub-networks reach 80.6%,87%,and 92.9%,respectively.The classification accuracy of the combined model OFT model framework of the three subnetworks reaches 93.3%.In addition,two methods were used to obtain BERT word vectors for text data,the final OFT's accuracy reaches 94.2%.The experimental results show that the proposed model outperforms the existing methods.
Content quality inspectionFour modes of transformerThree modes of attentionOFT model framework