基于数据剪辑的自训练信用评估集成分类模型

扫码查看

原文链接

NETL
NSTL
万方数据
维普

中文摘要：针对信用数据不平衡及类标签数据难以获取的问题,提出一种基于数据剪辑的自训练信用评估集成分类模型.首先,采用合成少数类过采样法(SMOTE)在有标记样本上采样,以缓解数据不平衡性.其次,在少量带标签样本数据集上构建Stacking集成模型,并对无标记样本做"伪标记",以获取类标签数据.最后,提出一种改进的双重加权半监督K近邻算法,并利用其剪辑伪标签数据和扩充训练集,直到模型收敛.使用UCI和Kaggle信用评估数据集进行仿真试验,结果表明,该模型具有更好的预测性能,更能有效识别少数类样本.

外文标题：Self-training credit evaluation integrated classification model based on data editing

外文摘要：Aiming at the problems of unbalance of credit data and difficult acquisition of label data,a self-training credit evaluation integrated classification model based on data editing was proposed.Firstly,synthetic minority over-sampling technique(SMOTE)was used to sample labeled samples to alleviate data imbalance.Secondly,a Stacking integration model was constructed on a few labeled sample datasets and unlabeled samples were"falsified"to obtain label-like data.Finally,an improved semi-supervised double-weighted K-nearest neighbor algorithm was proposed,which was used to clip the pseudo-label data and expand the training set until the model converged.Simulation experiments of UCI and Kaggle credit evaluation dataset show that the model has better predictive performance and can identify a few types of samples more effectively.

外文关键词：

credit evaluationsemi-superised learningStacking integration strategydata editingself-training

作者：

刘文杰、王国强

展开 >

作者单位：

上海工程技术大学数理与统计学院, 上海 201620

关键词：

信用评估半监督学习 Stacking集成策略数据剪辑自训练

基金：

国家自然科学基金面上项目资助浦东新区科技发展基金产学研专项资金(人工智能)项目资助全国统计科学研究项目一般项目资助

项目编号：

11971302PKX2020-R022020LY067

出版年：

2024

上海工程技术大学学报

上海工程技术大学

上海工程技术大学学报

影响因子：0.264

ISSN：1009-444X

年,卷(期)：2024.38(1)

参考文献量23