近年来深度学习在短文本聚类方面发挥巨大作用,最近提出的短文本聚类(Short Text Clustering,STC)算法在此方面取得不错的成效。为进一步提高聚类准确率并优化算法性能,基于指数函数提出改进的随机近邻嵌入算法。该算法用指数函数度量样本点与聚类中心差距,放大不同特征差别,并在后期使用k-means++算法预先确定聚类中心与聚类数目。在Stackoverflow数据集上的实验证明,随机指数嵌入聚类模型(e-STC)在准确率与标准互信息上均优于原STC模型,准确率相对提高3。2%,互信息相对提高2。9%。
STOCHASTIC NEIGHBOR EMBEDDING SHORT TEXT CLUSTERING IMPROVED BY EXPONENTIAL FUNCTION
In recent years,deep learning has played an important role on the short text clustering.The short text clustering algorithm(STC)proposed recently has achieved good results in this field.In order to further improve the clustering accuracy and optimize the performance of algorithm,an improved stochastic neighbor embedding algorithm based on exponential function(e-STC)is proposed.This algorithm magnified the difference between different features by using exponential function to calculate the gap between sample points and clustering center.In the later stage,K-Means++algorithm was used to determine the clustering center and clustering number in advance.The results of experiments on Stackoverflow dataset show that e-STC algorithm is superior to the original STC algorithm in terms of the accuracy and the normalized mutual information metric.The accuracy is improved by 3.2%,and the normalized mutual information is increased by 2.9%relatively.
Short text clusteringDepth clusteringRandom neighbor embeddingFeature extraction