A Comprehensive Pre-processing Approach for High-Performance Classification of Twitter Data with several Machine Learning Algorithms

扫码查看

原文链接

NETL

外文摘要：Producing an average of five hundred million tweets per date， Twitter has grown as one of the most comprehensive platforms of data interpretation for the researchers。 Beforehand， various researches have been conveyed on twitter data i。e。， sentimental analysis。 Nevertheless， not much research has been performed to classify the tweets in terms of categories so that tweets can be spread as per user preferences。 In this research， we started by constructing four comprehensive classes: politics， sports， crime and natural。 Next， we implemented our proposed preprocessing model on the raw twitter dataset。 After that， we implemented different machine learning techniques (Random Forest， K-Nearest Neighbors， Naive Bayes， Logistic Regression， Decision Tree and Support Vector Machine) to classify the twitter data。 Finally， we examined the outcomes with and without preprocessing in terms of sensitivity， specificity， and accuracy。 We found that our proposed preprocessing model enhanced the performance of all the machine learning classifiers。

外文关键词：

Twitter Data ClassificationNovel Preprocessing ModelRandom ForestK-Nearest NeighborsNaive BayesLogistic RegressionDecision TreeSupport Vector Machine

作者：

Ananya Sarker、Md. Rabiul Islam、Azmain Yakin Srizon

展开 >

作者单位：

Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh

会议名称：

IEEE Region 10 Symposium

会议地点：

Dhaka(BD)

会议母体文献：

2020 IEEE Region 10 Symposium

页码：

630-633

出版时间：

2020

DOI：

10.1109/TENSYMP50017.2020.9230590