Text Mining Algorithm for User Comment Clustering in Big Data Scenario
Traditional text data mining algorithms are less effective in text clustering mining under big data sce-narios,so a user comment clustering algorithm based on text data mining under big data scenarios is proposed in the paper.Firstly,user comment data features were extracted by designing an improved information gain algorithm,and feature data were formed by extracting text keywords and imbalanced data items according to information entropy.After that,the feature data were clustered and mined using the improved clustering data mining algorithm.Finally,the improved clustering data mining algorithm was parallelized based on Spark framework.Experiments were designed to verify and analyze the performance of the proposed feature extraction algorithm and the clustering mining algorithm.The results show that the proposed algorithm outperforms the traditional algorithm in terms of running time,accuracy and speedup ratio in the big data scenario.
Big dataFeature extractionClustering miningParallelization