首页|大数据场景下用户评论聚类文本挖掘算法

大数据场景下用户评论聚类文本挖掘算法

扫码查看
因传统文本数据挖掘算法在大数据场景下的文本聚类挖掘效果较差,提出一种大数据场景下基于文本数据挖掘的用户评论聚类算法.首先,通过设计改进的信息增益算法提取用户评论数据特征,根据信息熵提取文本关键字和不平衡数据项形成特征数据.之后,使用改进的聚类数据挖掘算法对特征数据进行聚类挖掘.最后,基于Spark框架将改进的聚类数据挖掘算法进行并行化改造.设计实验验证分析所提特征提取算法与聚类挖掘算法的性能,结果表明在大数据场景下所提算法的运行时间、准确率和加速比方面优于传统算法.
Text Mining Algorithm for User Comment Clustering in Big Data Scenario
Traditional text data mining algorithms are less effective in text clustering mining under big data sce-narios,so a user comment clustering algorithm based on text data mining under big data scenarios is proposed in the paper.Firstly,user comment data features were extracted by designing an improved information gain algorithm,and feature data were formed by extracting text keywords and imbalanced data items according to information entropy.After that,the feature data were clustered and mined using the improved clustering data mining algorithm.Finally,the improved clustering data mining algorithm was parallelized based on Spark framework.Experiments were designed to verify and analyze the performance of the proposed feature extraction algorithm and the clustering mining algorithm.The results show that the proposed algorithm outperforms the traditional algorithm in terms of running time,accuracy and speedup ratio in the big data scenario.

Big dataFeature extractionClustering miningParallelization

王红林、李忠伟

展开 >

南京信息工程大学人工智能学院(未来技术学院),江苏南京 210044

大数据 特征提取 聚类挖掘 并行化

国家自然科学基金青年基金

62101274

2024

计算机仿真
中国航天科工集团公司第十七研究所

计算机仿真

CSTPCD
影响因子:0.518
ISSN:1006-9348
年,卷(期):2024.41(3)
  • 14