Constructing a Topic Generation Model Based on Chinese Text Category Information
[Purpose/significance]In order to solve the problems of insufficient semantic description and weak topic semantic coher-ence in traditional LDA models for text topic recognition,this paper attempts to integrate text category information into the LDA model,forming a new topic generation model based on Chinese text category information,namely the CLCI-LDA model,which provides new tools for text analysis and knowledge discovery in the field of data mining.[Method/process]When using the CLCI-LDA model to ex-tract topics,first,the Sentence BERT model of deep learning is used to transform the text into a sentence embedding vector,and con-catenated with the document topic vector generated by the LDA model to improve the semantic richness and relevance of the text vec-tor;Then,use the K-means clustering algorithm to cluster the text and obtain the category information of the text;Finally,based on the frequency of topic words,obtain high-frequency keywords in each category family and condense the topic.[Result/conclusion]A lit-erature topic extraction experiment was conducted in the research field of"smart libraries"in China to compare the application effects of the CLCI-LDA model and traditional LDA model.The results indicate that the CLCI-LDA model can better obtain topic words with semantic information,and the topic consistency index obtained by this model is superior to traditional LDA models.[Innovation/limita-tion]Compared to traditional LDA models,the CLCI-LDA model has advantages in the depth of text semantic representation and the rationality of topic condensation.However,the new model also has shortcomings in parameter tuning and the need for further improve-ment in semantic understanding depth;In addition,the universality of the CLCI-LDA model still needs to be tested.