首页|Examining LDA2Vec and Tweet Pooling for Topic Modeling on Twitter Data
Examining LDA2Vec and Tweet Pooling for Topic Modeling on Twitter Data
扫码查看
点击上方二维码区域,可以放大扫码查看
原文链接
NSTL
The short lengths of tweets present a challenge for topic modeling to extend beyond what is provided explicitly from hashtag information. This is particularly true for LDA-based methods because the amount of information available from pertweet statistical analysis is severely limited. In this paper we present LDA2Vec paired with temporal tweet pooling (LDA2Vec-TTP) and assess its performance on this problem relative to traditional LDA and to Biterm Topic Model (Biterm), which was developed specifically for topic modeling on short text documents. We paired each of the three topic modeling algorithms with three tweet pooling schemes: no pooling, authorbased pooling, and temporal pooling. We then conducted topic modeling on two Twitter datasets using each of the algorithms and the tweet pooling schemes. Our results on the largest dataset suggest that LDA2Vec-TTP can produce higher coherence scores and more logically coherent and interpretable topics.