计算机应用与软件2024,Vol.41Issue(10) :282-286,318.DOI:10.3969/j.issn.1000-386x.2024.10.042

基于主题词向量中心点的K-means文本聚类算法

K-MEANS TEXT CLUSTERING ALGORITHM BASED ON THE CENTER POINT OF SUBJECT WORD VECTOR

季铎 刘云钊 彭如香 孔华锋
计算机应用与软件2024,Vol.41Issue(10) :282-286,318.DOI:10.3969/j.issn.1000-386x.2024.10.042

基于主题词向量中心点的K-means文本聚类算法

K-MEANS TEXT CLUSTERING ALGORITHM BASED ON THE CENTER POINT OF SUBJECT WORD VECTOR

季铎 1刘云钊 1彭如香 2孔华锋3
扫码查看

作者信息

  • 1. 中国刑事警察学院 辽宁沈阳 110854
  • 2. 公安部第三研究所 上海 201204
  • 3. 武汉商学院 湖北武汉 430056
  • 折叠

摘要

K-means由于其时间复杂度低运行速度快一直是最为流行的聚类算法之一,但是该算法在进行聚类时需要预先给出聚类个数和初始类中心点,其选取得合适与否会直接影响最终聚类效果.该文对初始类中心和迭代类中心的选取进行大量研究,根据决策图进行初始类中心的选择,利用每个类簇的主题词向量替代均值作为迭代类中心.实验表明,该文的初始点选取方法能够准确地选取初始点,且利用主题词向量作为迭代类中心能够很好地避免噪声点和噪声特征的影响,很大程度上地提高了K-means算法的性能.

Abstract

K-means is one of the most popular clustering algorithms because of its low time complexity and fast running speed.However,K-means algorithm needs to give the number of clusters and the initial center points in advance when clustering,and its selection will directly affect the final clustering effect.In this paper,a lot of research has been done on the selection of initial class center and iterative class center.The initial cluster center was selected according to the decision diagram,and the subject word vector of each cluster was used instead of the mean value as the iterative cluster center.Experiments show that the initial point selection method in this paper can accurately select the initial point,and using the subject word vector as the iterative class center can well avoid the influence of noise points and noise features,and greatly improve the k-means clustering performance.

关键词

K-means/初始点/决策图/迭代类中心/主题词向量

Key words

K-means/Initial point/Decision graph/Iterative class center/Topic word vector

引用本文复制引用

基金项目

国家重点研发计划项目(2018YFC0830401)

辽宁网络安全执法协同创新中心开放课题()

出版年

2024
计算机应用与软件
上海市计算技术研究所 上海计算机软件技术开发中心

计算机应用与软件

CSTPCD北大核心
影响因子:0.615
ISSN:1000-386X
段落导航相关论文