哈尔滨理工大学学报2024,Vol.29Issue(4) :21-28.DOI:10.15938/j.jhust.2024.04.003

差分隐私K-means聚类算法改进

Improvement of Differential Privacy K-means Clustering Algorithm

郭如敏 陈学斌 单丽洋
哈尔滨理工大学学报2024,Vol.29Issue(4) :21-28.DOI:10.15938/j.jhust.2024.04.003

差分隐私K-means聚类算法改进

Improvement of Differential Privacy K-means Clustering Algorithm

郭如敏 1陈学斌 2单丽洋3
扫码查看

作者信息

  • 1. 华北理工大学 理学院,河北 唐山 063210
  • 2. 河北省数据科学与应用重点实验室,河北 唐山 063210
  • 3. 唐山市数据科学重点实验室,河北 唐山 063210
  • 折叠

摘要

针对差分隐私K-means聚类算法中心点选取的盲目性以及隐私预算分配不合理导致聚类效果差的问题,对差分隐私K-means算法进行改进.依据初始中心点选取的两个原则,设计一种新的中心点选取方案.依据原始K-means算法中质心与差分隐私K-means算法中质心的均方差,计算每一次迭代需要的隐私预算的最小值,与二分法结合,建立了一种新的隐私预算分配方案.通过在 3 个不同特征数据集上的对比实验,改进后的算法F-measure值提升14%,不仅降低了添加噪声对聚类效果的影响,而且保证了聚类效果的可用性.

Abstract

In order to address the issues of arbitrary center selection and unreasonable privacy budget allocation leading to poor clustering performance in differential privacy K-means clustering algorithm,a new center selection scheme is designed based on two principles for initial center selection.By calculating the minimum privacy budget required for each iteration based on the mean square error between centroids in the original K-means algorithm and the ones in the differential privacy K-means algorithm,a new privacy budget allocation scheme is established in combination with binary search.Comparative experiments on three different feature datasets are conducted to evaluate the improved algorithm.The improved algorithm achieves a 14%increase in F-measure value,not only reducing the impact of added noise on clustering performance but also ensuring the usability of clustering results.

关键词

隐私保护技术/K-means聚类/聚类算法

Key words

privacy preserving techniques/K-means clustering/clustering algorithms

引用本文复制引用

出版年

2024
哈尔滨理工大学学报
哈尔滨理工大学

哈尔滨理工大学学报

CSTPCD北大核心
影响因子:0.508
ISSN:1007-2683
段落导航相关论文