结合密度峰值和集成过滤器的自训练算法

Self-training Algorithm Combining Density Peak and Integrated Filter

韩运龙 ¹尚庆生 ¹赵薇 ¹郭泓¹

扫码查看

作者信息

1. 兰州财经大学信息工程学院,甘肃兰州 730020
折叠

摘要

准确选取高置信度样本是提升自训练算法分类性能的关键.针对自训练迭代过程中的误分类样本,提出一种结合密度峰值和集成过滤器的自训练算法:利用密度峰值聚类计算样本的密度和峰值,构建初始高置信度样本集;为了过滤自训练迭代过程中的误分类样本,设计一个集成过滤器,从初始高置信度样本集进一步选择高置信度样本,将其添加进有标签样本集中迭代训练.在9个数据集上与4个相关的自训练算法进行对比实验,结果表明,算法的平均准确率和F分数分别为67.90％和65.54％,其分类性能显著优于对比算法.

Abstract

Accurately selecting high confidence samples is the key to improve the classification performance of self-training algo-rithm.A self-training algorithm combining density peaks and integrated filters was proposed to address misclassified samples in self-training iteration process.The algorithm first used density peak clustering to calculate the density and peak value of samples,and constructed an initial high confidence sample set.Secondly,in order to filter out misclassified samples in self-training iteration process,a novel integrated filter was designed.High confidence samples were further selected from the initial high confidence sample set and added to the labeled sample set for iterative training.Comparative experiments were conducted with 4 related self-training algorithms on 9 datasets.The experimental results show that the average accuracy and F-score of the proposed algorithm are 67.90％and 65.54％respectively,and its classification performance is significantly superior to that of the comparison algorithm.

关键词

自训练/无标签样本/高置信度样本/密度峰值/集成过滤器

Key words

self-training/unlabelled sample/high-confidence sample/density peak/integrated filters

引用本文复制引用

基金项目

甘肃省自然科学基金项目(21JR1RA283)

出版年

2024

宜宾学院学报

宜宾学院

宜宾学院学报

CHSSCD

影响因子：0.185

ISSN：1671-5365

参考文献量12

段落导航