基于主题词向量中心点的K-means文本聚类算法

扫码查看

原文链接

万方数据
维普

中文摘要：K-means由于其时间复杂度低运行速度快一直是最为流行的聚类算法之一,但是该算法在进行聚类时需要预先给出聚类个数和初始类中心点,其选取得合适与否会直接影响最终聚类效果.该文对初始类中心和迭代类中心的选取进行大量研究,根据决策图进行初始类中心的选择,利用每个类簇的主题词向量替代均值作为迭代类中心.实验表明,该文的初始点选取方法能够准确地选取初始点,且利用主题词向量作为迭代类中心能够很好地避免噪声点和噪声特征的影响,很大程度上地提高了K-means算法的性能.

外文标题：K-MEANS TEXT CLUSTERING ALGORITHM BASED ON THE CENTER POINT OF SUBJECT WORD VECTOR

外文摘要：K-means is one of the most popular clustering algorithms because of its low time complexity and fast running speed.However,K-means algorithm needs to give the number of clusters and the initial center points in advance when clustering,and its selection will directly affect the final clustering effect.In this paper,a lot of research has been done on the selection of initial class center and iterative class center.The initial cluster center was selected according to the decision diagram,and the subject word vector of each cluster was used instead of the mean value as the iterative cluster center.Experiments show that the initial point selection method in this paper can accurately select the initial point,and using the subject word vector as the iterative class center can well avoid the influence of noise points and noise features,and greatly improve the k-means clustering performance.

外文关键词：

K-meansInitial pointDecision graphIterative class centerTopic word vector

作者：

季铎、刘云钊、彭如香、孔华锋

展开 >

作者单位：

中国刑事警察学院辽宁沈阳 110854

公安部第三研究所上海 201204

武汉商学院湖北武汉 430056

关键词：

K-means 初始点决策图迭代类中心主题词向量

基金：

国家重点研发计划项目辽宁网络安全执法协同创新中心开放课题

项目编号：

2018YFC0830401

出版年：

2024

DOI：

10.3969/j.issn.1000-386x.2024.10.042

计算机应用与软件

上海市计算技术研究所上海计算机软件技术开发中心

计算机应用与软件

CSTPCD北大核心

影响因子：0.615

ISSN：1000-386X

年,卷(期)：2024.41(10)