面向纵向联邦学习的隐私保护数据对齐框架

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：纵向联邦学习中,各个客户端持有的数据集中包含有重叠的样本ID和不同维度的样本特征,需要进行数据对齐以适应模型训练.现有数据对齐技术一般将各方样本ID交集作为公开信息,如何在不泄露样本ID交集的前提下实现数据对齐成为亟需解决的问题.基于可交换加密和同态加密技术,该文构造了隐私保护的数据对齐框架ALIGN,包括数据加密、密文盲化、密文求交和特征拼接等步骤,使得相同的原始样本ID经过双重可交换加密可变换为相同的密文,并且对样本特征经同态加密后又进行了盲化处理.ALIGN框架能够对参与方样本ID的密文求交,将交集内样本ID对应的全部特征数据进行拼接并以秘密分享形式分配给参与方.相比现有数据对齐技术,该框架不仅能够保护样本ID交集的隐私性,同时能安全地删除样本ID交集外的样本信息.对ALIGN框架的安全性证明表明,除数据规模外,各客户端不能通过数据对齐获得关于对方数据的任何信息,保证了隐私保护策略的有效性.与现有工作相比,每增加10%的冗余数据,ALIGN框架利用所得数据对齐结果可将模型训练时间缩短约1.3秒,将模型训练准确度稳定在85%以上.仿真实验结果表明,通过ALIGN框架进行纵向联邦学习数据对齐,有利于提升后续模型训练的效率和模型准确度.

外文标题：A Privacy-preserving Data Alignment Framework for Vertical Federated Learning

外文摘要：In vertical federated learning,the datasets of the clients have overlapping sample IDs and features of different dimensions,thus the data alignment is necessary for model training.As the intersection of the sample IDs is public in current data alignment technologies,how to align the data without any leakage of the intersection becomes a key issue.The proposed private-preserving data ALIGNment framework(ALIGN)is based on interchangeable encryption and homomorphic encryption technologies,mainly including data encryption,ciphertext blinding,private intersecting,and feature splicing.The sample IDs are encrypted twice based on an interchangeable encryption algorithm,where the same ciphertexts correspond to the same plaintexts,and the sample features are encrypted and then randomly blinded based on a homomorphic encryption algorithm.The intersection of the encrypted sample IDs is obtained,and the corresponding features are then spliced and secretly shared with the participants.Compared to the existing technologies,the privacy of the ID intersection is protected,and the samples corresponding to the IDs outside intersection can be removed safely in our framework.The security proof shows that each participant cannot obtain any knowledge of each other except for the data size,which guarantees the effectiveness of the private-preserving strategies.The simulation experiments demonstrate that the runtime is shortened about 1.3 seconds and the model accuracy keeps higher than 85%with every 10%reduction of the redundant data.The simulation experimental results show that using the ALIGN framework for vertical federated learning data alignment is beneficial for improving the efficiency and accuracy of subsequent model training.

外文关键词：

Vertical federated learningData alignmentPrivacy protectionCommutative encryptionHomomorphic encryption

作者：

高莹、谢雨欣、邓煌昊、朱祖坤、张一余

展开 >

作者单位：

北京航空航天大学网络空间安全学院北京 100191

中关村实验室北京 100194

关键词：

纵向联邦学习数据对齐隐私保护可交换加密同态加密

基金：

北京市自然科学基金腾讯微信犀牛鸟基金

项目编号：

M21033

出版年：

2024

DOI：

10.11999/JEIT231234

电子与信息学报

中国科学院电子学研究所国家自然科学基金委员会信息科学部

电子与信息学报

CSTPCD北大核心

影响因子：1.302

ISSN：1009-5896

年,卷(期)：2024.46(8)