A Fusion Matrix-based Study on Text Clustering of Document Retrieval Results
Purpose/Significance To solve the deficiencies in the semantic representation of medical texts,and to realize the clustering of the retrieval results of the PubMed database.Method/Process The paper proposes a method to construct a fusion matrix by using the Jac-card coefficient and TF-IDF.Similarity relations between phrases,documents,and the contents of phrases and documents are combined to construct a fusion matrix,and several clustering algorithms are trained to group a collection of documents from the PubMed database.Cate-gory annotations are created to describe the meaning of each category of clustered documents.Result/Conclusion Experimental results show that the fusion matrix-based clustering is superior in grouping the document sets,and the extracted high-frequency words in the category de-scriptions distinguish the meanings of the categories well,so the fusion matrix design is effective for clustering descriptions of academic texts.