Text Similarity Computing Based on Topic Model LDA
Latent Dirichlet Allocation (LDA) is an unsupervised model which exhibits superiority on latent topic modeling of text data in the research of recent years.This paper presented a method which improves text similarity calculation by using LDA model.This method models corpus and text with LDA.Parameters are estimated with Gibbs sampling of MCMC and the word probability is represented.It can mine the hidden relationship between the different topics and the words from texts,get the topic distribution,and compute the similarity between the text.Finally,the text similarity matrix clustering experiments are carried out to assess the effect of clustering.Experimental results show that the method can improve the text similarity accurate rate and clustering quality effectively.