Performance Comparison and Analysis of Classification Algorithms Based on Spark Platform
In view of the rapid development of big data and machine learning technology,MLlib machine learning library based on Spark platform is used to implement feedforward artificial neural network,support vector machine and random forest,three machine learning algorithms,the operation and classification performance of the three algorithms under the big data platform are analyzed and evaluated.The experimental results show that with the increase of the number of nodes,the time consumed by the three algorithms on the big data platform gradually decreases.When the dataset is less than 100MB,the acceleration ratio of neural network and support vector machine algorithm is higher,and when the dataset is larger than 1GB,the acceleration ratio of random forest algorithm is better than the other two algorithms.The neural network algorithm has the least scalability when the data set is 100MB,and the support vector machine algorithm has the least scalability when the data set is 500MB.The random forest algorithm has better scale growth than the other two algorithms when the data set is larger than 1GB.By comparing the time efficiency and ac-curacy of the three classification algorithms,the SVM algorithm consumes the least time,but the classification accuracy is the low-est.Neural network algorithm consumes the longest time,and the classification accuracy is lower than random forest algorithm.Ran-dom forest algorithm has the highest classification accuracy,but its running time is higher than support vector machine algorithm.The integrated classification algorithm shows better time performance and classification accuracy on the big data platform.
big dataHadoop frameworkSpark frameworkmachine learningperformance evaluation