Optimizing Yinyang K-means algorithm on many-core CPUs
Traditional Yinyang K-means algorithm is computationally expensive when dealing with large-scale clustering problems.An efficient parallel acceleration implementation of Yinyang K-means algorithm was proposed on the basis of the architectural characteristics of typical many-core CPUs.This implementation was based on a new memory data layout,used vector units in many-core CPUs to accelerate distance calculation in Yinyang K-means,and targeted memory access optimization for NUMA(non-uniform memory access)characteristics.Compared with the open source multi-threaded version of Yinyang K-means algorithm,this implementation can achieve the speedup of up to5.6 and8.7 approximately on ARMv8 and x86 many-core CPUs,respectively.Experiments show that the optimization successfully accelerate Yinyang K-means algorithm in many-core CPUs.