Research on Differential Privacy High Dimensional Data Publishing Technology Based on Bayesian Networks
Improving data availability while implementing privacy protection is challenging in high-dimensional structured data publishing;however,the classic PrivBayes algorithm can solve this issue.To further reduce computational costs and improve data availability,a differential privacy data-publishing algorithm based on Bayesian networks,ELPrivBayes,is proposed.It analyzes the theoretical computational cost of the Bayesian network structure in the learning stage,constructs a correlation matrix for storing Mutual Information(MI)between attributes,avoids redundant calculations of MI in the iterative process of structural learning algorithms,and reduces time complexity.Based on the Average MI(AMI),the order in which nodes enter the Bayesian network is optimized,and the expected mutual information contribution of the exponential mechanism in the iterative process of structural learning increases,thereby improving the statistical approximation between the generated and original datasets.The low sensitivity of the network structure quality to the selection of the first node is analyzed empirically.Experimental results on four typical datasets show that,compared with the classical PrivBayes algorithm and its improved solutions,the computational cost in the structural learning stage is reduced by 97%-99%,the MI captured based on the exponential mechanism is improved by 14%-67%,the average variation distance between the generated and original datasets is reduced by 32%-40%,and the accuracy of the constructed Support Vector Machine(SVM)classifier is improved by 4%-5%.Moreover,when e≤0.8,the availability improvement of data generated using the ELPrivBayes algorithm is more significant.
data publishingBayesian networkdifferential privacyprivacy protectioncorrelation matrixAverage Mutual Information(AMI)