Simulation of Big Data Random Walkthrough Sampling Considering Attribute Correlation
Typically,big data is non-random.However,the problem of sampling bias may exist in massive big da-ta,and some groups may be oversampled or undersampled,leading to low accuracy in results.To address this,a ran-dom-walk sampling algorithm for big data based on attribute correlation was proposed.Firstly,after obtaining the neighborhood relationship matrix of big data,the big data single-attribute neighborhood relationship matrix was derived based on sorting ideas.Then,the neighborhood relationship matrices of different big data attributes and the correlation between data attributes were calculated to obtain the attribute reduction results.Secondly,interval density similarity was used to adjust the interval and construct a variable grid space.Finally,the grid space and density devia-tion sampling algorithm were effectively combined to complete the big data random walk sampling.The simulation a-nalysis results show that the algorithm can significantly improve sample quality.Energy consumption is noticeably low-er,with a maximum of only 280Wh.This method can obtain more accurate random-walk sampling results for big data.