The existing training strategies for clustering ensemble algorithm are generally conducted based on the same data and different base clustering algorithms and commonly have the limitations of low performance for large-scale data and weak adaptability of consensus function.To address these problems,this paper proposed a label iteration-based clus-tering ensemble(LICE)algorithm which was developed based on the training strategy for clustering ensemble algorithm of different data and same base clustering algorithm.Firstly,multiple base clusterings were trained based on the random sample partition(RSP)data blocks.Secondly,the base clustering results with same cluster numbers were fused with maxi-mum mean discrepancy criterion and then a heuristic classifier was trained based on the RSP data blocks with labels.Thirdly,the sample points without labels were labeled with heuristic classifier which was iteratively enhanced with the la-beled sample points having the consistent labeling for clustering and classification.Finally,a series of persuasive experi-ments were conducted to validate the feasibility and effectiveness of LICE algorithm.The experimental results showed that the normalized mutual information,adjusted Rand index,Fowlkes-Mallows index and purity of LICE algorithm in-creased by 17.23%,16.75%,31.29%,and 12.37%on average at the 5th iteration compared to the initial iteration and these four indexes increased by 11.76%,16.50%,9.36%,and 14.20%on average for the representative datasets in com-parison with seven state-of-the-art clustering ensemble algorithms and thus demonstrate that LICE algorithm is an effi-cient and reasonable clustering ensemble algorithm with the potential to handle large-scale data clustering problems.
clustering ensemble algorithmensemble learningrandom sample partitionmaximum mean discrepancylabel iteration