首页|K-Modes聚类数据收集和发布过程中的混洗差分隐私保护方法

K-Modes聚类数据收集和发布过程中的混洗差分隐私保护方法

扫码查看
针对目前聚类数据收集与发布安全性不足的问题,为保护聚类数据中的用户隐私并提高数据质量,基于混洗差分隐私模型,提出一种去可信第三方的K-Modes聚类数据收集和发布的隐私保护方法.首先,使用K-Modes聚类数据收集算法对用户数据进行采样并加噪,再通过填补取值域随机排列发布算法打乱采样数据的初始顺序,使恶意攻击者不能根据用户与数据之间的关系识别出目标用户.然后,尽可能减小噪声的干扰,利用循环迭代的方式计算出新的质心完成聚类.最后,从理论层面上分析了以上 3 种方法的隐私性、可行性和复杂度,并利用 3个真实数据集和近年来具有权威性的同类算法 KM、DPLM、LDPKM 等进行准确率、熵值的对比,验证所提方法的有效性.实验结果表明,所提方法的隐私保护和发布数据质量均优于当前同类算法.
Shuffled differential privacy protection method for K-Modes clustering data collection and publication
Aiming at the current problem of insufficient security in clustering data collection and publication,in order to protect user privacy and improve data quality in clustering data,a privacy protection method for K-Modes clustering data collection and publication was proposed without trusted third parties based on the shuffled differential privacy model.K-Modes clustering data collection algorithm was used to sample the user data and add noise,and then the initial order of the sampled data was disturbed by filling in the value domain random arrangement publishing algorithm.The malicious attacker couldn't identify the target user according to the relationship between the user and the data,and then to reduce the interference of noise as much as possible a new centroid was calculated by cyclic iteration to complete the clustering.Finally,the privacy,feasibility and complexity of the above three methods were analyzed from the theoretical level,and the accuracy and entropy of the three real data sets were compared with the authoritative similar algorithms KM,DPLM and LDPKM in recent years to verify the effectiveness of the proposed model.The experimental results show that the privacy protection and data quality of the proposed method are superior to the current similar algorithms.

shuffled differential privacyK-Modes clusteringprivacy protectiondata collectiondata publication

蒋伟进、陈艺琳、韩裕清、吴玉庭、周为、王海娟

展开 >

湖南工商大学计算机学院,湖南 长沙 410205

武汉理工大学计算机与人工智能学院,湖北 武汉 430070

湘江实验室,湖南 长沙 410205

湖南工商大学前沿交叉学院,湖南 长沙 410205

展开 >

混洗差分隐私 K-Modes聚类 隐私保护 数据收集 数据发布

国家自然科学基金资助项目国家自然科学基金资助项目湖南省自然科学基金重点资助项目新零售虚拟现实技术湖南省重点实验室基金资助项目湖南省教育厅科学研究重点基金资助项目湖南省学位与研究生教学改革基金资助项目

72088101617721962020JJ42492017TP102621A03742022JGYB194

2024

通信学报
中国通信学会

通信学报

CSTPCD北大核心
影响因子:1.265
ISSN:1000-436X
年,卷(期):2024.45(1)
  • 8