首页|结合聚类边界采样的主动学习

结合聚类边界采样的主动学习

扫码查看
主动学习是一种机器学习方法,需要选择最有价值的样本进行标注.目前,主动学习在应用时面临着一些挑战,其依赖分类器的先验假设,这容易导致分类器性能意外下降,同时需要一定规模的样本作为启动条件.聚类可以降低问题规模,是主动学习的一种有效手段.为此,结合密度聚类边界采样,开展主动学习方法的研究.针对容易产生分类错误的聚类边界区域,通过计算样本密度,提出一种密度峰值聚类边界点采样方法;在此基础上,给出密度熵的定义,并利用密度熵对聚类边界区域进行启发式搜索,提出一种基于聚类边界采样的主动学习方法.试验结果表明,与文献中的 5 种主动学习算法相比,该算法能够以更少标记量获得同等甚至更高的分类性能,是一种有效的主动学习算法;在标记不足,无标签样本总量 20%的情况下,算法在Ac-curacy、F-score等指标上取得较好的结果.
Active learning combined with clustering boundary sampling
Active learning is a machine learning method that requires the selection of the most valuable samples for la-beling.Currently,active learning encounters certain challenges in its practical application.It relies on prior assumptions of the classifier,which can lead to unexpected declines in classifier performance and requires a specific number of samples as an initial condition.Clustering,which can reduce the complexity of a problem,serves as an effective tool in active learning.Based on density clustering boundary sampling,this study focuses on active learning methods.First,a method of sampling boundary points in density peak clustering is introduced.This method calculates the sample density for a clustering boundary region that is prone to classification errors.Subsequently,with a specified definition of dens-ity entropy,an active learning method based on cluster boundary sampling is proposed.This method employs density entropy for the heuristic search of cluster boundary regions.The experimental results show that the proposed algorithm,compared with the five active learning algorithms referenced in the literature,can achieve equal or even higher classific-ation performance with fewer markers.This proves that it is an effective active learning algorithm.When the number of labeled samples is less than 20%of the total number of unlabeled samples,the algorithm achieves better results in the accuracy and F-score metrics.

active learningmachine learningcluster boundarydensity peak clusteringgeometric samplingentropyversion spaceactive clustering

胡峰、李路正、代劲、刘群

展开 >

重庆邮电大学 计算机科学与技术学院, 重庆 400065

主动学习 机器学习 聚类边界 密度峰值聚类 几何采样 信息熵 版本空间 主动聚类

国家重点研发计划重庆市教委重点合作项目重庆市自然科学基金

2018YFC0832102HZ2021008cstc2021jcyj-msxmX0849

2024

智能系统学报
中国人工智能学会 哈尔滨工程大学

智能系统学报

CSTPCD北大核心
影响因子:0.672
ISSN:1673-4785
年,卷(期):2024.19(2)
  • 26