南京理工大学学报(自然科学版)2024,Vol.48Issue(3) :335-341.DOI:10.14177/j.cnki.32-1397n.2024.48.03.011

一种基于粗糙熵的改进K-modes聚类算法

Improved K-modes clustering algorithm based on rough entropy

刘财辉 曾雄 谢德华
南京理工大学学报(自然科学版)2024,Vol.48Issue(3) :335-341.DOI:10.14177/j.cnki.32-1397n.2024.48.03.011

一种基于粗糙熵的改进K-modes聚类算法

Improved K-modes clustering algorithm based on rough entropy

刘财辉 1曾雄 2谢德华1
扫码查看

作者信息

  • 1. 赣南师范大学 数学与计算机科学学院,江西 赣州 341000
  • 2. 吉安职业技术学院 机械与电子工程学院,江西 吉安 343000
  • 折叠

摘要

K-modes聚类算法被广泛应用于人工智能、数据挖掘等领域.传统的K-modes聚类算法有不错的聚类效果,但是存在迭代次数多、计算量大、容易受到冗余属性的干扰等问题,且仅采用简单的0-1 匹配的方法来定义2 个样本属性值之间的距离,没有充分考虑每个属性对聚类结果的影响.针对上述问题,该文将粗糙熵引入K-modes算法.首先利用粗糙集属性约简算法消除冗余属性,确定各属性的重要程度;然后利用粗糙熵确定每个属性的权重,从而定义新的类内距离.将该文所提算法与传统的K-modes聚类算法分别在4 组公开数据集上进行对比试验.试验结果表明,该文所提算法聚类准确率比传统的K-modes聚类算法更高.

Abstract

At present,K-modes clustering algorithm is widely used in artificial intelligence,data mining and other fields.The traditional K-modes clustering algorithm has good clustering effect,but it also faces too many iterations,large amount of calculation,easy to be interfered by redundant attributes and other problems.In addition,only a simple 0-1 matching method is used to define the distance between the attribute values of each two samples,without fully considering the influence of each attribute on the clustering results.To solve the above problems,this paper introduces the rough entropy into K-modes algorithm.Firstly,the attribute reduction algorithm of rough set is used to eliminate redundant attributes and determine the importance of each attribute.Then,the rough entropy is used to determine the weight of each attribute,so as to define a new intra-class distance.In this paper,the proposed algorithm was compared with the traditional K-modes algorithm on four groups of public data sets respectively.The experimental results show that the proposed algorithm has higher clustering accuracy than the traditional K-modes algorithm.

关键词

聚类/K-modes算法/粗糙集/粗糙熵/属性约简/权重

Key words

clustering/K-modes algorithm/rough sets/rough entropy/attribute reduction/weight

引用本文复制引用

基金项目

国家自然科学基金(62166001)

江西省自然科学基金(20202BAB202010)

出版年

2024
南京理工大学学报(自然科学版)
南京理工大学

南京理工大学学报(自然科学版)

CSTPCD北大核心
影响因子:0.526
ISSN:1005-9830
参考文献量18
段落导航相关论文