基于知识蒸馏的差分隐私联邦学习方法

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：差分隐私技术作为一种隐私保护方法,在联邦学习领域得到了广泛应用.现有的差分隐私应用于联邦学习的研究,或是未考虑无标签公共数据,或是未考虑客户端之间的数据量差异,限制了其在现实场景的应用.文中提出一种基于知识蒸馏的差分隐私联邦学习方法,引入无标签公共数据集并考虑到客户端之间数据量的差异,为此场景设计了专用的差分隐私方案.首先,按数据量大小将客户端分组为"大数据量客户端"和"一般客户端",用大数据量客户端的数据训练教师模型,教师模型为公共数据集添加伪标签,然后,公共数据集作为"特殊客户端"与"一般客户端"共同进行联邦训练.采用差分隐私技术保证客户端的数据隐私,由于特殊客户端的数据只有标签涉及隐私,在联邦训练中为其分配比一般客户端更多的隐私预算;限制隐私预算总量,设联邦训练阶段的隐私预算为定值,根据客户端对隐私性的需求和隐私预算平行组合性质,调整伪标签添加阶段的隐私预算.在MNIST数据集和SVHN数据集上的实验表明,在同等的隐私预算消耗下,训练得到了精度比传统方法更高的模型.本方案具有可拓展性,高灵活度的隐私预算分配使其可以满足复杂的隐私需求.

外文标题：Differential Privacy Federated Learning Method Based on Knowledge Distillation

外文摘要：Differential privacy technology,as a privacy protection method,has been widely applied in federated learning.The existing research on the application of differential privacy in federated learning either fails to consider unlabeled public data or the difference in data volume between clients,which limits its application in real-world scenarios.This paper proposes a differential privacy federated learning method based on knowledge distillation,which introduces unlabeled public datasets and considers the differences in data volume between clients.A dedicated differential privacy scheme is designed for this scenario.Firstly,the clients are grouped into"large data clients"and"general clients"based on the size of the data.The teacher model is trained using the da-ta from the large data clients,and the teacher model adds pseudo labels to the public dataset.Then,the public dataset is used as a"special client"to jointly conduct federated training with the"general client".Adopting differential privacy technology to ensure the data privacy of clients,as the data of special clients only involves privacy with labels,more privacy budgets are allocated to them in federated training compared to general clients.Limit the total amount of privacy budget,set the privacy budget for the federal training stage as a fixed value,and adjust the privacy budget for the pseudo label addition stage based on the client's pri-vacy needs and the parallel combination property of privacy budget.Experiments on the MNIST and SVHN datasets show that,under the same privacy budget consumption,the trained model has higher accuracy than traditional methods.This scheme has scalability,and its high flexibility of privacy budget allocation enables it to meet complex privacy needs.

外文关键词：

Federated learningDifferential privacyKnowledge distillationPrivacy protectionPrivacy budget

作者：

谭智文、徐茹枝、王乃玉、罗丹

展开 >

作者单位：

华北电力大学控制与计算机工程学院北京 102206

关键词：

联邦学习差分隐私知识蒸馏隐私保护隐私预算

基金：

国家自然科学基金

项目编号：

61972148

出版年：

2024

DOI：

10.11896/jsjkx.230600002

计算机科学

重庆西南信息有限公司（原科技部西南信息中心）

计算机科学

CSTPCD北大核心

影响因子：0.944

ISSN：1002-137X

年,卷(期)：2024.51(z1)

参考文献量29