首页|基于数据剪辑的自训练信用评估集成分类模型

基于数据剪辑的自训练信用评估集成分类模型

扫码查看
针对信用数据不平衡及类标签数据难以获取的问题,提出一种基于数据剪辑的自训练信用评估集成分类模型。首先,采用合成少数类过采样法(SMOTE)在有标记样本上采样,以缓解数据不平衡性。其次,在少量带标签样本数据集上构建Stacking集成模型,并对无标记样本做"伪标记",以获取类标签数据。最后,提出一种改进的双重加权半监督K近邻算法,并利用其剪辑伪标签数据和扩充训练集,直到模型收敛。使用UCI和Kaggle信用评估数据集进行仿真试验,结果表明,该模型具有更好的预测性能,更能有效识别少数类样本。
Self-training credit evaluation integrated classification model based on data editing
Aiming at the problems of unbalance of credit data and difficult acquisition of label data,a self-training credit evaluation integrated classification model based on data editing was proposed.Firstly,synthetic minority over-sampling technique(SMOTE)was used to sample labeled samples to alleviate data imbalance.Secondly,a Stacking integration model was constructed on a few labeled sample datasets and unlabeled samples were"falsified"to obtain label-like data.Finally,an improved semi-supervised double-weighted K-nearest neighbor algorithm was proposed,which was used to clip the pseudo-label data and expand the training set until the model converged.Simulation experiments of UCI and Kaggle credit evaluation dataset show that the model has better predictive performance and can identify a few types of samples more effectively.

credit evaluationsemi-superised learningStacking integration strategydata editingself-training

刘文杰、王国强

展开 >

上海工程技术大学 数理与统计学院, 上海 201620

信用评估 半监督学习 Stacking集成策略 数据剪辑 自训练

国家自然科学基金面上项目资助浦东新区科技发展基金产学研专项资金(人工智能)项目资助全国统计科学研究项目一般项目资助

11971302PKX2020-R022020LY067

2024

上海工程技术大学学报
上海工程技术大学

上海工程技术大学学报

影响因子:0.264
ISSN:1009-444X
年,卷(期):2024.38(1)
  • 23