K-MEANS TEXT CLUSTERING ALGORITHM BASED ON THE CENTER POINT OF SUBJECT WORD VECTOR
K-means is one of the most popular clustering algorithms because of its low time complexity and fast running speed.However,K-means algorithm needs to give the number of clusters and the initial center points in advance when clustering,and its selection will directly affect the final clustering effect.In this paper,a lot of research has been done on the selection of initial class center and iterative class center.The initial cluster center was selected according to the decision diagram,and the subject word vector of each cluster was used instead of the mean value as the iterative cluster center.Experiments show that the initial point selection method in this paper can accurately select the initial point,and using the subject word vector as the iterative class center can well avoid the influence of noise points and noise features,and greatly improve the k-means clustering performance.
K-meansInitial pointDecision graphIterative class centerTopic word vector