基于相对熵和余弦相似度的并行SVM算法

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：针对大数据环境下并行支持向量机(SVM)算法存在子集分布偏差大,并行效率低以及过滤非支持向量不准确等问题,提出了基于相对熵和余弦相似度的并行SVM算法(RC-PSVM).该算法首先提出基于相对熵的数据划分策略(DPRE),平衡当前子集和原始数据集的相对熵,划分样本到适合的子集,降低子集分布偏差;然后提出基于余弦相似度的冗余层级检测策略(CS-RLDS),计算相邻层局部SVM之间法向量的余弦相似度,比较设定的阈值与相似度,识别并停止冗余层级,提高了并行效率;最后提出非支持向量过滤策略(NSVF),结合样本到多个局部支持向量模型决策边界的距离,计算支持向量相似度来识别非支持向量,解决了过滤非支持向量不准确的问题.实验表明,RC-PSVM算法的分类效果更佳,且在大数据下的运行效率更高.

外文标题：Parallel support vector machine algorithm based on relative entropy and cosine similarity

外文摘要：Aiming at the problems of parallel support vector machine algorithm in big data environment such as large subset distribution deviation,low parallel efficiency and inaccurate filtering of non-support vector,a parallel support vector machine algorithm based on relative entropy and cosine similarity Parallel Support Vector Machine algorithm based on Relative Entropy and Cosine Similarity(RC-PSVM)was proposed.A data partitioning Data Partitioning based on Relative Entropy(DPRE)strategy based on relative entropy was proposed,which balanced the relative en-tropy of the current subset and the original data set,and divided the sample into a suitable subset to reduce the devi-ation of the subset distribution.Then,Redundancy Level Detection Strategy based on Cosine Similarity(CS-RLDS)was designed to calculate the cosine similarity of normal vectors between adjacent layer local support vector machines via comparing the set threshold and similarity to identify and stop the redundancy level,which improved the parallel efficiency.Finally,the Non-Support Vector Filtering strategy(NSVF)was developed,which calculated the support vector similarity by combining the distance between the sample and the decision boundaries of multiple local support vector models to identify Non-support vector to solve the problem of inaccurate filtering of non-support vector.Ex-periments showed that the classification effect of the RC-PSVM algorithm was better,and the operation was more efficient under big data.

外文关键词：

big dataMapReduce frameworkparallel support vector machinerelative entropycosine similarity

作者：

毛伊敏、郭斌斌、易见兵、陈志刚

展开 >

作者单位：

江西理工大学信息工程学院,江西赣州 341000

中南大学计算机学院,湖南长沙 410083

关键词：

大数据 MapReduce框架并行支持向量机相对熵余弦相似度

基金：

国家自然科学基金资助项目科技创新2030—"新一代人工智能"重大资助项目

项目编号：

415620192020AAA0109605

出版年：

2024

DOI：

10.13196/j.cims.2022.0084

计算机集成制造系统

中国兵器工业集团第210研究所

计算机集成制造系统

CSTPCD北大核心

影响因子：1.092

ISSN：1006-5911

年,卷(期)：2024.30(9)