基于密度峰值聚类的Tri-training算法

Tri-training Algorithm Based on Density Peaks Clustering

罗宇航 ¹吴润秀 ¹崔志华 ²张翼英 ³何业慎 ⁴赵嘉¹

扫码查看

作者信息

1. 南昌工程学院信息工程学院,江西南昌 330099
2. 太原科技大学计算机科学与技术学院,山西太原 030024
3. 天津科技大学人工智能学院,天津 300457
4. 深圳市国电科技通信有限公司,广东深圳 518000
折叠

摘要

Tri-training利用无标签数据进行分类可有效提高分类器的泛化能力,但其易将无标签数据误标,从而形成训练噪声.提出一种基于密度峰值聚类的Tri-training(Tri-training with density peaks clustering,DPC-TT)算法.密度峰值聚类通过类簇中心和局部密度可选出数据空间结构表现较好的样本.DPC-TT算法采用密度峰值聚类算法获取训练数据的类簇中心和样本的局部密度,对类簇中心的截断距离范围内的样本认定为空间结构表现较好,标记为核心数据,使用核心数据更新分类器,可降低迭代过程中的训练噪声,进而提高分类器的性能.实验结果表明:相比于标准Tri-training算法及其改进算法,DPC-TT算法具有更好的分类性能.

Abstract

Tri-training can effectively improve the generalization ability of classifiers by using unlabeled data for classification,but it is prone to mislabeling unlabeled data,thus forming training noise.Tri-training(Tri-training with density peaks clustering,DPC-TT)algorithm based on density peaks clustering is proposed.The DPC-TT algorithm uses the density peaks clustering algorithm to obtain the class cluster centers and local densities of the training data,and the samples within the truncation distance of the class cluster centers are identified as the samples with better spatial structure,and these samples are labeled as the core data,and the classifier is updated with the core data,which can reduce the training noise during the iteration to improve the performance of the classifier.The experimental results show that the DPC-TT algorithm has better classification performance compared with the standard Tri-training algorithm and its improvement algorithm.

关键词

Tri-training/半监督学习/密度峰值聚类/空间结构/分类器

Key words

Tri-training/semi-supervised learning/density peaks clustering/spatial structure/classifier

引用本文复制引用

基金项目

国家自然科学基金(52069014)

出版年

2024

系统仿真学报

北京仿真中心中国系统仿真学会

系统仿真学报

CSTPCDCSCD北大核心

影响因子：0.551

ISSN：1004-731X

参考文献量30

段落导航