Research on Text Semantic Enhancement Topic Crawler Integrating BTM and TextCNN
In the information era with a large amount of information,how to efficiently and accurately retrieve the information we required is a huge challenge.Topic Crawlers are an effective way to get information about a particular domain.General topic similarity computation is based on the word granularity level,while ignoring the expression of the whole semantic feature,which will lead to the impact of both precision and recall of the crawler system.In order to solve this problem,a topic crawler method based on BTM and TextCNN is proposed,and the content topic discrimination module is considered as a text classification problem.The text semantic information is enhanced by fusing the text topic vector from BTM and Word2vec word vectors.This method uses convolutional neural network to improve the accuracy of discriminant module,which can improve the problem of inadequate representation of text features of convolutional neural network.The experimental results show that the average classification precision of the test sets in the open source news text classification dataset(THUCNews)and the real paper data-set is respectively 93.7%and 91.3%on the fused BTM and TextCNN models,which respectively increases 0.6 and 1.3 percentage points com-pared with the TextCNN benchmark model.