Self-training credit evaluation integrated classification model based on data editing
Aiming at the problems of unbalance of credit data and difficult acquisition of label data,a self-training credit evaluation integrated classification model based on data editing was proposed.Firstly,synthetic minority over-sampling technique(SMOTE)was used to sample labeled samples to alleviate data imbalance.Secondly,a Stacking integration model was constructed on a few labeled sample datasets and unlabeled samples were"falsified"to obtain label-like data.Finally,an improved semi-supervised double-weighted K-nearest neighbor algorithm was proposed,which was used to clip the pseudo-label data and expand the training set until the model converged.Simulation experiments of UCI and Kaggle credit evaluation dataset show that the model has better predictive performance and can identify a few types of samples more effectively.