High-dimensional data governance method of distributed database based on Bayes
The lack of high-dimensional data governance capability of distributed database will increase the network transmission cost and additional network overhead.In order to quickly identify abnormal data,timely retrieve target data,and accurately predict target problems,a Bayesian high-dimensional data governance method of distributed database is proposed.This method first uses the segmentation factor to delete invalid data blocks in the distributed database and reduce the amount of high-dimensional data migration during the task execution,Then use Bayesian method to select variables with strong correlation between outcome variables and starting variables from two perspectives of prior distribution and posterior distribution.Finally,under the given conditions of the specific sce-nario corresponding to the distributed database,determine the location of the variables,complete the design of high-dimensional data control chart in the distributed environment,and achieve high-dimensional data governance in the distributed database.The experi-mental results show that the proposed method is effective.