首页|本地差分隐私下的高维数据发布方法

本地差分隐私下的高维数据发布方法

扫码查看
从众多用户收集的高维数据可用性越来越高,庞大的高维数据涉及用户个人隐私,如何在使用高维数据的同时保护用户的隐私极具挑战性.文中主要关注本地差分隐私下的高维数据发布问题.现有的解决方案首先构建概率图模型,生成输入数据的一组带噪声的低维边缘分布,然后使用它们近似输入数据集的联合分布以生成合成数据集.然而,现有方法在计算大量属性对的边缘分布构建概率图模型,以及计算概率图模型中规模较大的属性子集的联合分布时存在局限性.基于此,提出了一种本地差分隐私下的高维数据发布方法 PrivHDP(High-dimensional Data Publication Under Local Differential Privacy).首先,该方法使用随机采样响应代替传统的隐私预算分割策略扰动用户数据,提出 自适应边缘分布计算方法计算成对属性的边缘分布构建Markov网.其次,使用新的方法代替互信息度量成对属性间的相关性,引入了基于高通滤波的阈值过滤技术缩减概率图构建过程的搜索空间,结合充分三角化操作和联合树算法获得一组属性子集.最后,基于联合分布分解和冗余消除,计算属性子集上的联合分布.在4个真实数据集上进行实验,结果表明,PrivHDP算法在k-way查询和SVM分类精度方面优于同类算法,验证了所提方法的可用性与高效性.
High-dimensional Data Publication Under Local Differential Privacy
With the increasing availability of high-dimensional data collected from numerous users,preserving user privacy while utilizing high-dimensional data poses significant challenges.This paper focuses on the problem of high-dimensional data publica-tion under local differential privacy.State-of-the-art solutions first construct probabilistic graphical models to generate a set of noisy low-dimensional marginal distributions of the input data,and then use them to approximate the joint distribution of the in-put dataset for generating synthetic datasets.However,existing methods have limitations in computing marginal distributions for a large number of attribute pairs to construct probabilistic graphical models,as well as in calculating joint distributions for attrib-ute subsets within the probabilistic graphical models.To address these limitations,this paper proposes a method PrivHDP(high-dimensional data publication under local differential privacy)for high-dimensional data publication under local differential priva-cy.Firstly,it uses random sampling response instead of the traditional privacy budget splitting strategy to perturb user data.It proposes an adaptive marginal distribution computation method to compute the marginal distributions of pairwise attributes and construct a Markov network.Secondly,it employs a novel method to measure the correlation between pairwise attributes,repla-cing mutual information.This method introduces a threshold technique based on high-pass filtering to reduce the search space during the construction of the probabilistic graphical model.It combines sufficient triangulation operations and a joint tree algo-rithm to obtain a set of attribute subsets.Finally,based on joint distribution decomposition and redundancy elimination,the pro-posed method computes the joint distribution over attribute subsets.Experimental results on four real datasets demonstrate that the PrivHDP algorithm outperforms similar algorithms in terms of k-way query and SVM classification accuracy,validating its ef-fectiveness and efficiency.

Local differential privacyHigh-dimensional dataData publicationMarginal distributionJoint distribution

蔡梦男、沈国华、黄志球、杨阳

展开 >

南京航空航天大学计算机科学与技术学院 南京 211106

南京航空航天大学高安全系统的软件开发与验证技术工业和信息化部重点实验室 南京 211106

软件新技术与产业化协同创新中心 南京 210093

本地差分隐私 高维数据 数据发布 边缘分布 联合分布

国家自然科学基金国家自然科学基金民航应急科学与技术重点实验室开放基金

U224121661772270NJ2022022

2024

计算机科学
重庆西南信息有限公司(原科技部西南信息中心)

计算机科学

CSTPCD北大核心
影响因子:0.944
ISSN:1002-137X
年,卷(期):2024.51(2)
  • 27