Self-training method based on semi-supervised clustering and data editing
According to the problem that unlabeled samples of high confidence selected by self-training method contain less information in each iteration and self-training method is easy to mislabel unlabeled samples,a Naive Bayes self-training method based on semi-supervised clustering and data editing was proposed.Firstly,semi-supervised clustering was used to classify a small number of labeled samples and a large number of unlabeled samples,and the unlabeled samples with high membership were chosen,then they were classified by Naive Bayes.Secondly,the data editing technique was used to filter out unlabeled samples with high clustering membership which were misclassified by Naive Bayes.The data editing technique could filter noise by utilizing information of the labeled samples and unlabeled samples,solving the problem that performance of traditional data editing technique may be decreased due to lack of labeled samples.The effectiveness of the proposed algorithm was verified by comparative experiments on UCI datasets.