信息技术2024,Issue(10) :14-21,29.DOI:10.13274/j.cnki.hdzj.2024.10.003

基于优化模糊C-means算法的不平衡大数据分类研究

Research on unbalanced big data classification based on optimized fuzzy C-means algorithm

卓柳俊 曾心怡
信息技术2024,Issue(10) :14-21,29.DOI:10.13274/j.cnki.hdzj.2024.10.003

基于优化模糊C-means算法的不平衡大数据分类研究

Research on unbalanced big data classification based on optimized fuzzy C-means algorithm

卓柳俊 1曾心怡2
扫码查看

作者信息

  • 1. 中国人民大学信息学院,北京 100872
  • 2. 河南省社会科学院,郑州 450002
  • 折叠

摘要

针对不平衡大数据的分类问题,提出一种优化模糊C-means算法的不平衡大数据分类算法.先计算C-means模糊交叉算子,定义优化函数,并求解大数据不平衡增益.利用Spark分类平台,确定大数据样本压缩模糊近邻值的取值范围,再通过放大近邻值的处理方式,定义不平衡阈向量,从而完善整个分类流程,完成基于优化模糊C-means算法的不平衡大数据分类方法的设计.实验结果表明,上述分类方法的应用,可将正例信息、负例信息的取样长度区间完全分离开来,能有效解决因不平衡大数据分类不精准造成的信息样本混淆的问题,符合实际应用需求.

Abstract

To solve the classification problem of unbalanced big data,this paper proposes an unbalanced big data classification algorithm based on optimized fuzzy C-means algorithm.Firstly,the C-means fuzzy crossover operator is calculated,the optimization function is defined,and the unbalanced gain of big data is solved.The Spark classification platform is used to determine the value range of condensed fuzzy nearest neighbor values of big data samples,and then the unbalanced threshold vector is defined by the processing method of enlarging the nearest neighbor values,so as to improve the whole classification process and com-plete the design of unbalanced big data classification method based on the optimized fuzzy C-means algo-rithm.The experiment results show that the application of the above classification method can completely separate the sampling length interval of positive example information and negative example information,ef-fectively solve the problem of information sample confusion caused by inaccurate classification of unbalanced big data,and meet the practical application requirements.

关键词

优化模糊C-means算法/不平衡大数据/交叉算子/卡方检验/压缩模糊近邻值

Key words

optimized fuzzy C-means algorithm/unbalanced big data/crossover operator/chi-square test/condensed fuzzy nearest neighbor values

引用本文复制引用

基金项目

河南省社会科学规划项目(2022CFX029)

出版年

2024
信息技术
黑龙江省信息技术学会 中国电子信息产业发展研究院 中国信息产业部电子信息中心

信息技术

CSTPCD
影响因子:0.413
ISSN:1009-2552
段落导航相关论文