Analysis of Multi-dimensional Data De-Duplication Clustering Algorithm Based Sparse Self-Coding
With the continuous development of science and technology information,the volume and type of data are increasing day by day.To address the problem of high dimensionality of data sets and complicated extraction of ef-fective information due to many duplicate data,this paper proposes a multi-dimensional data clustering algorithm based on improved sparse self-encoder.The algorithm is divided into two major parts:data processing and clustering analysis.The data processing first uses the layer-by-layer greedy principle in S-SAE to downscale the high-dimen-sional data set to a 6-dimensional data set in each group;Then the mapped value matching mechanism is used to clean the downscaled data set with duplicate data,and the cleaned values are replaced by 0;Then the processed data are put into the K-Means++clustering algorithm for clustering analysis;Finally,a TS-SAE-K-Means++multi-di-mensional data clustering model is constructed and its optimal parameter settings are derived by optimization analysis.The simulation comparison analysis of different baseline combination algorithms shows that TS-SAE-K-Means++out-performs other algorithm combinations in the evaluation system of clustering profile coefficient S and model eigenvalue F1.This indicates that the algorithm proposed in this paper has certain superiority in solving the problem of effective information extraction within high-dimensional data.