Text Clustering Algorithm Combining Density and Partition
Document clustering is a classic application of clustering,which is to classify similar documents into the same cate-gory,which can effectively organize,summarize and navigate text information,and can also be used to improve the classification ef-fect.This article uses the BERT model to process documents into vectors and represents documents as high-dimensional vectors.The traditional density clustering algorithm is not suitable for high-dimensional data sets.The K-means algorithm in the partition clustering algorithm can effectively cluster documents,but the performance of the algorithm is very dependent on the selection of the initial center point.This paper proposes a new text clustering algorithm that merges density and partition.First,the appropriate clus-tering center points are selected by density,and then the idea of the farthest distance is used to gradually select the initial cluster center points,and finally,the partition method is used to analyze the data set for clustering.Experiments show that the clustering ef-fect of the new algorithm is stable and good clustering results have been achieved.