首页|基于融合矩阵的文本相似度计算实现检索结果聚类

基于融合矩阵的文本相似度计算实现检索结果聚类

扫码查看
目的/意义 弥补医学文本语义表示方面的不足,实现PubMed数据库检索结果聚类.方法/过程 采用Jaccard系数和TF-IDF构建融合矩阵方法,建立短语间、文档间、短语与文档内容间的相似性关系融合矩阵,训练聚类算法,将PubMed数据库检索结果集合分组,随后生成类别标签,描述每一类簇文档的含义.结果/结论 基于融合矩阵的聚类效果较好,提取出描述类别的高频词能很好地区分类别含义,对检索结果文本聚类任务有效.
A Fusion Matrix-based Study on Text Clustering of Document Retrieval Results
Purpose/Significance To solve the deficiencies in the semantic representation of medical texts,and to realize the clustering of the retrieval results of the PubMed database.Method/Process The paper proposes a method to construct a fusion matrix by using the Jac-card coefficient and TF-IDF.Similarity relations between phrases,documents,and the contents of phrases and documents are combined to construct a fusion matrix,and several clustering algorithms are trained to group a collection of documents from the PubMed database.Cate-gory annotations are created to describe the meaning of each category of clustered documents.Result/Conclusion Experimental results show that the fusion matrix-based clustering is superior in grouping the document sets,and the extracted high-frequency words in the category de-scriptions distinguish the meanings of the categories well,so the fusion matrix design is effective for clustering descriptions of academic texts.

document retrievaltext clusteringfusion matrixtext similarity

赵悦阳、崔雷

展开 >

中国医科大学附属盛京医院图书馆 沈阳 110004

中国医科大学医学健康管理学院 沈阳 110122

文献检索 文本聚类 融合矩阵 文本相似度

辽宁省社会科学规划基金

L20BTQ003

2024

医学信息学杂志
中国医学科学院

医学信息学杂志

CSTPCD
影响因子:1.348
ISSN:1673-6036
年,卷(期):2024.45(3)
  • 22