首页|基于深度对比学习的文本聚类

基于深度对比学习的文本聚类

扫码查看
无监督聚类的目的是根据表示空间中的距离将数据划分为有意义或有用的簇,但往往不同类别在表示空间中是相互重叠的,为了实现不同类别的良好分离,使用实例对比学习模型,修改模型的激活函数为Tanh,并将单层感知机修改为多层感知机,提出了深度对比学习聚类模型.模型首先将原始中文长文本数据集输入神经网络特征提取层BERT中,然后将提取到的全部特征输入实例对比学习层中,对特征进行优化,最终使用K-means进行聚类.深度对比学习聚类模型在中文长文本聚类方面的性能相比于无监督聚类,在THUCNews数据集上的准确度提高了 10%~25%.能够更好地促进不同类别相互重叠的数据的有效分离,实验效果显著优于现有的其他相关模型.
Text Clustering Based on Deep Contrast Learning
The purpose of unsupervised clustering is to divide the data into meaningful or useful clusters according to the distance in the representation space,The different categories are overlap-ping each other in the representation space,In order to achieve a good separation of different catego-ries,it can use an example contrast learning model(SCCL),on the basis of the SCCL model,the activation function of the model is modified to Tanh,The Single-Layer Perceptron(SLP)was modi-fied to a multilayer perceptron,and a Clustering with Deep Contrastive Learning Model(CDCL)was proposed.The model first inputs the original Chinese long text dataset into the neural network fea-ture extraction layer Bert,and then inputs all the extracted features into the Instance-wise Contras-tive Learning(Instance-CL)layer to optimize the features,and finally use K-means for clustering.The performance of the deep contrast learning clustering model CDCL in Chinese long text clustering is evaluated,and it is shown that the deep contrast learning clustering model CDCL improves the ac-curacy of unsupervised clustering by 10%-25%compared with unsupervised clustering on the THUCNews dataset.The results show that the model can better promote the effective separation of different categories of overlapping data,and the experimental effect is significantly better than other existing related models.

SCCLCDCLlong text clusteringK-meansInstance-CL

胥桂仙、李晓荣

展开 >

中央民族大学信息工程学院,北京 100081

实例对比学习模型 深度对比学习聚类模型 长文本聚类 K-means 实例对比学习层

北京市社科基金项目

20YYB011

2024

中央民族大学学报(自然科学版)
中央民族大学

中央民族大学学报(自然科学版)

影响因子:0.462
ISSN:1005-8036
年,卷(期):2024.33(3)