面向众核处理器的阴阳K-means算法优化
Optimizing Yinyang K-means algorithm on many-core CPUs
周天阳 1王庆林 1李荣春 1梅松竹 1尹尚飞 1郝若晨 1刘杰1
作者信息
- 1. 国防科技大学 计算机学院,湖南 长沙 410073;国防科技大学 并行与分布计算全国重点实验室,湖南 长沙 410073
- 折叠
摘要
传统阴阳K-means算法处理大规模聚类问题时计算开销十分昂贵.针对典型众核处理器的体系结构特征,提出了一种阴阳K-means算法高效并行加速实现.该实现基于一种新内存数据布局,采用众核处理器中的向量单元来加速阴阳K-means中的距离计算,并面向非一致内存访问(non-unified memory access,NUMA)特性进行了针对性的访存优化.与阴阳K-means算法的开源多线程实现相比,该实现在ARMv8 和x86 众核平台上分别获得了最高约5.6 与8.7 的加速比.因此上述优化方法在众核处理器上成功实现了对阴阳K-means算法的加速.
Abstract
Traditional Yinyang K-means algorithm is computationally expensive when dealing with large-scale clustering problems.An efficient parallel acceleration implementation of Yinyang K-means algorithm was proposed on the basis of the architectural characteristics of typical many-core CPUs.This implementation was based on a new memory data layout,used vector units in many-core CPUs to accelerate distance calculation in Yinyang K-means,and targeted memory access optimization for NUMA(non-uniform memory access)characteristics.Compared with the open source multi-threaded version of Yinyang K-means algorithm,this implementation can achieve the speedup of up to5.6 and8.7 approximately on ARMv8 and x86 many-core CPUs,respectively.Experiments show that the optimization successfully accelerate Yinyang K-means algorithm in many-core CPUs.
关键词
K-means/非一致内存访问/向量化/众核处理器/性能优化Key words
K-means/NUMA/vectorization/many-core CPU/performance optimization引用本文复制引用
出版年
2024