A Privacy-preserving Data Alignment Framework for Vertical Federated Learning
In vertical federated learning,the datasets of the clients have overlapping sample IDs and features of different dimensions,thus the data alignment is necessary for model training.As the intersection of the sample IDs is public in current data alignment technologies,how to align the data without any leakage of the intersection becomes a key issue.The proposed private-preserving data ALIGNment framework(ALIGN)is based on interchangeable encryption and homomorphic encryption technologies,mainly including data encryption,ciphertext blinding,private intersecting,and feature splicing.The sample IDs are encrypted twice based on an interchangeable encryption algorithm,where the same ciphertexts correspond to the same plaintexts,and the sample features are encrypted and then randomly blinded based on a homomorphic encryption algorithm.The intersection of the encrypted sample IDs is obtained,and the corresponding features are then spliced and secretly shared with the participants.Compared to the existing technologies,the privacy of the ID intersection is protected,and the samples corresponding to the IDs outside intersection can be removed safely in our framework.The security proof shows that each participant cannot obtain any knowledge of each other except for the data size,which guarantees the effectiveness of the private-preserving strategies.The simulation experiments demonstrate that the runtime is shortened about 1.3 seconds and the model accuracy keeps higher than 85%with every 10%reduction of the redundant data.The simulation experimental results show that using the ALIGN framework for vertical federated learning data alignment is beneficial for improving the efficiency and accuracy of subsequent model training.