Event clustering on social streams aims to cluster short texts according to event contents.Event clustering models can be divided into unsupervised learning or supervised learning at present.The unsupervised models suffer from poor performance,while the supervised models require lots of labeling data.To address the above issues,this paper proposes a semi-supervised incremental event clustering model SemiEC based on a small-scale annotated dataset.This model encodes the events by LSTM and calculates text similarity by a linear model,and then clusters short texts on social streams.In particular,it uses the samples generated by incremental clustering to retrain the model and redistribute the uncertain samples.Experimental results show that this model SemiEC outperforms the traditional clustering algorithms.
社交媒体事件聚类增量聚类文本相似度
郭恒睿、王中卿、李培峰、朱巧明
展开 >
苏州大学计算机科学与技术学院,苏州,中国
苏州大学计算机科学与技术学院,苏州,中国, 苏州大学人工智能研究院,苏州,中国
社交媒体事件聚类 增量聚类 文本相似度
Chinese National Conference on Computational Linguistic
Haikou(CN)
19th Chinese National Conference on Computational Linguistic