The similarity measurement between a long text and a short text relatively has more and more application scenarios,and the consistency judgment on these text pairs can be abstracted as a comparison problem of text similarity.The challenge is that the short text is sparse,it is difficult to determine which domain it belongs to and it is also difficult to introduce word embedding to solve the specific text matching problem in general scenarios.Aiming at this problem,this paper proposes a lightweight approach based on topic model with text clustering which can match generalized long-short texts without using extra related background knowledge.The experimental results on two typical test sample datasets show the text similarity detection efficiency of the proposed method is very high.
Natural language processingText matchingTopic modelGibbs sampling