Active learning combined with clustering boundary sampling
Active learning is a machine learning method that requires the selection of the most valuable samples for la-beling.Currently,active learning encounters certain challenges in its practical application.It relies on prior assumptions of the classifier,which can lead to unexpected declines in classifier performance and requires a specific number of samples as an initial condition.Clustering,which can reduce the complexity of a problem,serves as an effective tool in active learning.Based on density clustering boundary sampling,this study focuses on active learning methods.First,a method of sampling boundary points in density peak clustering is introduced.This method calculates the sample density for a clustering boundary region that is prone to classification errors.Subsequently,with a specified definition of dens-ity entropy,an active learning method based on cluster boundary sampling is proposed.This method employs density entropy for the heuristic search of cluster boundary regions.The experimental results show that the proposed algorithm,compared with the five active learning algorithms referenced in the literature,can achieve equal or even higher classific-ation performance with fewer markers.This proves that it is an effective active learning algorithm.When the number of labeled samples is less than 20%of the total number of unlabeled samples,the algorithm achieves better results in the accuracy and F-score metrics.
active learningmachine learningcluster boundarydensity peak clusteringgeometric samplingentropyversion spaceactive clustering