Parallel support vector machine algorithm based on relative entropy and cosine similarity
Aiming at the problems of parallel support vector machine algorithm in big data environment such as large subset distribution deviation,low parallel efficiency and inaccurate filtering of non-support vector,a parallel support vector machine algorithm based on relative entropy and cosine similarity Parallel Support Vector Machine algorithm based on Relative Entropy and Cosine Similarity(RC-PSVM)was proposed.A data partitioning Data Partitioning based on Relative Entropy(DPRE)strategy based on relative entropy was proposed,which balanced the relative en-tropy of the current subset and the original data set,and divided the sample into a suitable subset to reduce the devi-ation of the subset distribution.Then,Redundancy Level Detection Strategy based on Cosine Similarity(CS-RLDS)was designed to calculate the cosine similarity of normal vectors between adjacent layer local support vector machines via comparing the set threshold and similarity to identify and stop the redundancy level,which improved the parallel efficiency.Finally,the Non-Support Vector Filtering strategy(NSVF)was developed,which calculated the support vector similarity by combining the distance between the sample and the decision boundaries of multiple local support vector models to identify Non-support vector to solve the problem of inaccurate filtering of non-support vector.Ex-periments showed that the classification effect of the RC-PSVM algorithm was better,and the operation was more efficient under big data.
big dataMapReduce frameworkparallel support vector machinerelative entropycosine similarity