首页|基于隐空间扩散模型的差分隐私数据合成方法研究

基于隐空间扩散模型的差分隐私数据合成方法研究

扫码查看
数据共享与发布可以有效发挥数据的价值,能够在数智时代推动科技进步和经济社会的发展.在数据共享的同时如何保护数据版权及个人隐私仍是一项巨大的挑战.差分隐私数据合成是数据隐私保护的有效手段,数据持有者通过发布合成数据取代真实数据,一方面可以保护数据隐私,另一方面也可以提高数据的泛用性与可用性.针对差分隐私生成模型合成图像数据样本可用性低的问题,提出了基于隐空间扩散模型的两阶段差分隐私生成模型.首先对原始图像进行差分隐私感知信息压缩,将其从像素空间投射至隐空间中,获得原始敏感数据的脱敏隐向量表示.然后将隐向量输入扩散模型,使其逐渐转变为先验分布,并通过去噪过程进行采样.最后,使用MNIST和Fashion MNIST数据集训练并进行数据合成,结果表明该模型在FID和下游任务准确性上相比DP-Sinkhorn等SOTA模型均有明显提升.
Differential Privacy Data Synthesis Method Based on Latent Diffusion Model
The widespread application of data sharing and publication in the socio-economic domain drives scientific progress and societal development.However,issues related to copyright and privacy,especially concerning personal data,remain critical chal-lenges.Differential privacy data synthesis has emerged as an effective means of protecting data privacy,where data holders can re-lease synthetic data instead of real data,thereby enhancing data utility and availability while preserving privacy.In response to the limited usability of existing differential privacy generation models,this paper proposes a two-stage differential privacy generation model based on the latent space diffusion approach.Firstly,the differential privacy-aware information compression is performed on the original image,and it is projected from the pixel space to the latent space to obtain the desensitized latent vector represen-tation of the original sensitive data.The latent vector is then fed into a diffusion model to gradually transform into a prior distri-bution and sampled through a denoising process.Experimental results based on the MNIST and Fashion MNIST datasets demon-strate that the proposed model exhibits significant improvements in terms of Frechet inception distance(FID)and downstream task accuracy compared to state-of-the-art models like DP-Sinkhorn.

Differential privacyData synthesisGenerative modelsAutoencoderDiffusion models

葛胤池、张辉、孙浩航

展开 >

北京航空航天大学复杂关键软件环境全国重点实验室 北京 100191

差分隐私 数据合成 生成模型 自编码器 扩散模型

2024

计算机科学
重庆西南信息有限公司(原科技部西南信息中心)

计算机科学

CSTPCD北大核心
影响因子:0.944
ISSN:1002-137X
年,卷(期):2024.51(3)
  • 26