Robust Spectral Clustering Based on Density Distribution
Spectral clustering,as a classic clustering method based on graph theory,uses the similarity matrix to decompose the data or project the data into a low-dimensional space to achieve better data partition.In spectral clustering,the similarity matrix of data needs to be constructed first,and the similarity between data points is usually calculated by the Gaussian kernel function or k-nearest neighbors method.Then,the similarity matrix is transformed into a Laplacian matrix,and the eigendecomposition of the Laplacian matrix is carried out,and the eigenvectors are obtained and clustered by the k-means algorithm method.Finally,according to the clustering results,the data points belong to the cluster.Spectral clustering is of great significance in the field of data mining and pattern recognition.It is not only suitable for clustering problems,but also can be applied to graph segmentation,dimensionality reduction,feature selection and other fields,so it has a wide range of application values.However,the computational complexity of spectral clustering is high and may be limited when dealing with large-scale data sets.In addition,spectral clustering is sensitive to noise,because noisy data points may affect the construction of the similarity matrix and the calculation of the eigenvectors,resulting in instability and a decrease in the accuracy of the clustering results.Especially in the case of no noise preprocessing or denoising,spectral clustering may incorrectly divide noisy data points into a certain cluster,affecting the final clustering results.Therefore,when dealing with data containing noise,it is necessary to properly clean or denoise the data before using spectral clustering to improve the effect.To address these problems,this paper proposes a robust spectral clustering algorithm based on density distribution.Firstly,the noise points between subclusters have lower local density;therefore,this paper sets the noise coefficient to filter a small number of low-density noise points from the perspective of different density levels.Secondly,according to the characteristics of density peaks clustering,that is,dividing the data as much as possible can ensure the consistency of the data label within the subcluster,the newly proposed algorithm can achieve a balance between a smaller number of subclusters and a higher consistency of the label within the cluster,and achieve a better division of the data.Finally,based on the density distribution information between all clusters,including the density mean and density standard deviation,a new similarity measure is proposed to improve the clustering effect of spectral clustering on data sets with uneven density.In the proposed algorithm,the parameter in spectral clustering,the non-negative Gaussian kernel bandwidth,is replaced by the more easily adjusted k-nearest neighbors,which can select the optimal parameters of the algorithm in a more limited range.Experiments on synthetic data and real data fully demonstrate the effectiveness of the new algorithm in nine state-of-the-art improved algorithms.Under the premise of ensuring the clustering efficiency,the average values of accuracy,the adjusted rand index and the adjusted mutual information are increased by 10.02%,22.11%and 15.76%at least,respectively.