It has great theoretical significance and application value that accurately identifies imbalanced data categories with classes overlapping.Based on the Switching ensemble learning framework,this paper first defines the probability of the instance's class to be switched combined with the classes overlapping and the neighborhood distribution and then developes an ensemble algorithm for the imbalanced data classification with classes overlapping,named SwithcingHD.The algorithm not only improves the visibility of minority samples but also ultimately retains the original information of minority samples,which can effectively overcome the limitations of the existing Switching ensemble algorithm for imbalanced data classification with overlapping classes.Under three evaluation indexes,we compares the classification performance of SwithcingHD with three types of Switching-based ensemble algorithms and two types of traditional ensemble algorithms on 33 imbalanced datasets with classes overlapping.Then,the sensitivities of the classification effect of 6 ensemble algorithms to the proportion of samples to be switched and the number of baseline classifiers are analyzed,and the range of the optimal proportion of samples to be switched and the effect of two factors is derived.The analysis shows that the classification effect of SwithicngHD under AUC is significantly better than other ensemble algorithms,which is effective and superior in classifying imbalanced data with classes overlapping.Finally,taking the telecom customer data for example,the performance of SwitchingHD and 11 advanced ensemble algorithms on identifying the lost customer are further compared.
Imbalanced Data ClassificationClass OverlappingNeighborhood DistributionSwitching Algorithm