结合指数函数改进的随机近邻嵌入式短文本聚类

扫码查看

原文链接

万方数据
维普

中文摘要：近年来深度学习在短文本聚类方面发挥巨大作用,最近提出的短文本聚类(Short Text Clustering,STC)算法在此方面取得不错的成效.为进一步提高聚类准确率并优化算法性能,基于指数函数提出改进的随机近邻嵌入算法.该算法用指数函数度量样本点与聚类中心差距,放大不同特征差别,并在后期使用k-means++算法预先确定聚类中心与聚类数目.在Stackoverflow数据集上的实验证明,随机指数嵌入聚类模型(e-STC)在准确率与标准互信息上均优于原STC模型,准确率相对提高3.2％,互信息相对提高2.9％.

外文标题：STOCHASTIC NEIGHBOR EMBEDDING SHORT TEXT CLUSTERING IMPROVED BY EXPONENTIAL FUNCTION

外文摘要：In recent years,deep learning has played an important role on the short text clustering.The short text clustering algorithm(STC)proposed recently has achieved good results in this field.In order to further improve the clustering accuracy and optimize the performance of algorithm,an improved stochastic neighbor embedding algorithm based on exponential function(e-STC)is proposed.This algorithm magnified the difference between different features by using exponential function to calculate the gap between sample points and clustering center.In the later stage,K-Means++algorithm was used to determine the clustering center and clustering number in advance.The results of experiments on Stackoverflow dataset show that e-STC algorithm is superior to the original STC algorithm in terms of the accuracy and the normalized mutual information metric.The accuracy is improved by 3.2％,and the normalized mutual information is increased by 2.9％relatively.

外文关键词：

Short text clusteringDepth clusteringRandom neighbor embeddingFeature extraction

作者：

汪晓晨、宋叔尼

展开 >

作者单位：

东北大学理学院辽宁沈阳 110819

广东培正学院广东广州 510830

关键词：

短文本聚类深度算法随机近邻嵌入特征提取

基金：

国家自然科学基金

项目编号：

11801065

出版年：

2024

DOI：

10.3969/j.issn.1000-386x.2024.03.035

计算机应用与软件

上海市计算技术研究所上海计算机软件技术开发中心

计算机应用与软件

CSTPCD北大核心

影响因子：0.615

ISSN：1000-386X

年,卷(期)：2024.41(3)

参考文献量28