首页|结合密度峰值和集成过滤器的自训练算法

结合密度峰值和集成过滤器的自训练算法

扫码查看
准确选取高置信度样本是提升自训练算法分类性能的关键.针对自训练迭代过程中的误分类样本,提出一种结合密度峰值和集成过滤器的自训练算法:利用密度峰值聚类计算样本的密度和峰值,构建初始高置信度样本集;为了过滤自训练迭代过程中的误分类样本,设计一个集成过滤器,从初始高置信度样本集进一步选择高置信度样本,将其添加进有标签样本集中迭代训练.在9个数据集上与4个相关的自训练算法进行对比实验,结果表明,算法的平均准确率和F分数分别为67.90%和65.54%,其分类性能显著优于对比算法.
Self-training Algorithm Combining Density Peak and Integrated Filter
Accurately selecting high confidence samples is the key to improve the classification performance of self-training algo-rithm.A self-training algorithm combining density peaks and integrated filters was proposed to address misclassified samples in self-training iteration process.The algorithm first used density peak clustering to calculate the density and peak value of samples,and constructed an initial high confidence sample set.Secondly,in order to filter out misclassified samples in self-training iteration process,a novel integrated filter was designed.High confidence samples were further selected from the initial high confidence sample set and added to the labeled sample set for iterative training.Comparative experiments were conducted with 4 related self-training algorithms on 9 datasets.The experimental results show that the average accuracy and F-score of the proposed algorithm are 67.90%and 65.54%respectively,and its classification performance is significantly superior to that of the comparison algorithm.

self-trainingunlabelled samplehigh-confidence sampledensity peakintegrated filters

韩运龙、尚庆生、赵薇、郭泓

展开 >

兰州财经大学信息工程学院,甘肃兰州 730020

自训练 无标签样本 高置信度样本 密度峰值 集成过滤器

甘肃省自然科学基金项目

21JR1RA283

2024

宜宾学院学报
宜宾学院

宜宾学院学报

CHSSCD
影响因子:0.185
ISSN:1671-5365
年,卷(期):2024.24(6)
  • 12