首页|基于优化模糊C-means算法的不平衡大数据分类研究

基于优化模糊C-means算法的不平衡大数据分类研究

扫码查看
针对不平衡大数据的分类问题,提出一种优化模糊C-means算法的不平衡大数据分类算法。先计算C-means模糊交叉算子,定义优化函数,并求解大数据不平衡增益。利用Spark分类平台,确定大数据样本压缩模糊近邻值的取值范围,再通过放大近邻值的处理方式,定义不平衡阈向量,从而完善整个分类流程,完成基于优化模糊C-means算法的不平衡大数据分类方法的设计。实验结果表明,上述分类方法的应用,可将正例信息、负例信息的取样长度区间完全分离开来,能有效解决因不平衡大数据分类不精准造成的信息样本混淆的问题,符合实际应用需求。
Research on unbalanced big data classification based on optimized fuzzy C-means algorithm
To solve the classification problem of unbalanced big data,this paper proposes an unbalanced big data classification algorithm based on optimized fuzzy C-means algorithm.Firstly,the C-means fuzzy crossover operator is calculated,the optimization function is defined,and the unbalanced gain of big data is solved.The Spark classification platform is used to determine the value range of condensed fuzzy nearest neighbor values of big data samples,and then the unbalanced threshold vector is defined by the processing method of enlarging the nearest neighbor values,so as to improve the whole classification process and com-plete the design of unbalanced big data classification method based on the optimized fuzzy C-means algo-rithm.The experiment results show that the application of the above classification method can completely separate the sampling length interval of positive example information and negative example information,ef-fectively solve the problem of information sample confusion caused by inaccurate classification of unbalanced big data,and meet the practical application requirements.

optimized fuzzy C-means algorithmunbalanced big datacrossover operatorchi-square testcondensed fuzzy nearest neighbor values

卓柳俊、曾心怡

展开 >

中国人民大学信息学院,北京 100872

河南省社会科学院,郑州 450002

优化模糊C-means算法 不平衡大数据 交叉算子 卡方检验 压缩模糊近邻值

河南省社会科学规划项目

2022CFX029

2024

信息技术
黑龙江省信息技术学会 中国电子信息产业发展研究院 中国信息产业部电子信息中心

信息技术

CSTPCD
影响因子:0.413
ISSN:1009-2552
年,卷(期):2024.(10)