Microblog Noise Filtering Method Based on Self-adaptive Characteristics
Microblog noise filtering can remove garbage samples and reduce data scale.The noise seed words are generated by the clustering algorithm.FP-Growth algorithm is used to expand the seed words on unlabeled data to generate a noise feature word dictionary.Combining user and content characteristics,the support vector machine model is introduced to filter noise microblogs.The experimental results shows that the precision is 84%,the recall is 79%,the F1 value is 81%,which proves that the noise char-acteristics generated by the model can help to improve the filtering effect of microblog.