Online Topic Detection Method Based on Combination Similarity Dynamic Clustering and Word Entropy
[Research purpose]To achieve online detection and tracking of hot topics on the Internet and improve the clustering perform-ance of incremental clustering algorithms,a topic detection method based on combination similarity clustering is proposed.At the same time,topic word extraction and evolution tracking are achieved by calculating word entropy.[Research method]The named entity recog-nition of text is achieved through the CIFG-BiLSTM-CRF model,and the entity similarity between the text and the topic is calculated.Then,the maximum of cosine similarity between the word vector and the topic center is taken as the vector similarity of the text.And the entity similarity and vector similarity are combined to determine the topic to which the text belongs.During the clustering process,a time window strategy is used to dynamically update the topic center and member texts.At the same time,the word entropy of the text is calcu-lated to generate the word entropy sum list of topics,in order to achieve topic word extraction and evolution tracking.The experiment uses data of COVID-19 news to realize online topic detection,and presents the evolution and tracking process of topic keywords.[Research conclusion]The experiment shows that compared with traditional similarity calculation methods,combined similarity can achieve better clustering performance,and the topic keywords extracted during the clustering process also accurately reflect the topic content of the origi-nal data.