计算机工程与科学2024,Vol.46Issue(2) :244-252.DOI:10.3969/j.issn.1007-130X.2024.02.007

面向Stacking算法的差分隐私保护研究

Research on differential privacy protection for Stacking algorithm

董燕灵 张淑芬 徐精诚 王豪石
计算机工程与科学2024,Vol.46Issue(2) :244-252.DOI:10.3969/j.issn.1007-130X.2024.02.007

面向Stacking算法的差分隐私保护研究

Research on differential privacy protection for Stacking algorithm

董燕灵 1张淑芬 2徐精诚 1王豪石1
扫码查看

作者信息

  • 1. 华北理工大学理学院,河北 唐山 063210;河北省数据科学与应用重点实验室,河北 唐山 063210;唐山市数据科学重点实验室,河北 唐山 063210
  • 2. 华北理工大学理学院,河北 唐山 063210;河北省数据科学与应用重点实验室,河北 唐山 063210;唐山市数据科学重点实验室,河北 唐山 063210;唐山市大数据安全与智能计算重点实验室,河北 唐山 063210
  • 折叠

摘要

为解决同质集成学习算法对噪声更敏感,难以兼顾较好的预测性能和有效的隐私保护这一问题,提出一种基于差分隐私的DPStacking算法,将异质Stacking算法与差分隐私技术相结合,以优化算法的隐私保护和预测性能.但是,由于Stacking算法的低层和高层模型都可以由不同的学习器构成,若对某个具体学习器设计隐私预算分配方案来提供差分隐私保护,该方案往往无法适用于由任意基学习器和元学习构成的Stacking算法.基于此,设计了一种基于元学习器的隐私预算分配方案,此方案根据皮尔逊相关系数及差分隐私并行组合的特性为元学习器输入的不同构成体分配不同的隐私预算.通过理论与实验验证,DPStacking算法符合ε-差分隐私保护,与基于差分隐私的随机森林算法(DiffRFs)、Ada-boost算法(DP-AdaBoost)、XGBoost算法(DPXGB)相比,能有效保护数据隐私的同时拥有更好的预测性能,并较好地解决了单一同质集成学习算法对噪声更加敏感的问题.

Abstract

In order to solve the problem that homogeneous ensemble learning algorithms are more sensitive to noise and difficult to take into account both better predictive performance and effective priva-cy protection,a DPStacking algorithm based on differential privacy is proposed.This algorithm com-bines heterogeneous Stacking algorithms with differential privacy technology to optimize the privacy pro-tection and its predictive performance.However,since both the low-level and high-level models of the Stacking algorithm can be composed of different learners,if a privacy budget allocation scheme is de-signed for a particular learner to provide differential privacy protection,this scheme is often not applica-ble to Stacking algorithms composed of arbitrary base learners and meta-learners.Based on this,a pri-vacy budget allocation scheme based on meta-learners is designed,which allocates different privacy budgets to different components of meta-learners according to the Pearson correlation coefficient and the characteristics of differential privacy parallel combination.Through theoretical and experimental verifi-cation,DPStacking algorithm satisfies ε-differential privacy protection.Compared with differential pri-vacy random forest algorithm(DiffRFs),Adaboost algorithm(DP-AdaBoost),XGBoost algorithm(DPXGB),it can effectively guarantee data privacy while having better predictive performance,and bet-ter solve the problem that single homogeneous ensemble learning algorithm is more sensitive to noise.

关键词

差分隐私/隐私预算分配/Stacking算法/集成学习

Key words

differential privacy/privacy budget allocation/Stacking algorithm/ensemble learning

引用本文复制引用

基金项目

国家自然科学基金(U20A20179)

出版年

2024
计算机工程与科学
国防科学技术大学计算机学院

计算机工程与科学

CSTPCD北大核心
影响因子:0.787
ISSN:1007-130X
参考文献量12
段落导航相关论文