In this paper, an effective improved decision tree model R-C4。5 is introduced。 This model, based on C4。5, is improved on attribution selection and partitioning rules。 In R-C4。5, for every attribute, entropies of corresponding subsets and average entropy are calculated。 Then, the subsets whose entropies are not less than the average are united to produce a temporary compositive subset, that is to say, the branches which have poor classified effect are united。 Entropies of temporary compositive subset and the other subsets are used to modify the information gain of certain node, and attribute with the highest modified information gain is chosen as the test attribute for the current node。 R-C4。5 enhances interpretability of test attribute selection, reduces the number of insignificant or empty branches and avoids the appearance of over fitting。
decision treeC4.5R-C4.5classificationdata mining
LIU Peng
展开 >
Department of Information Systems, Shanghai University of Finance and Economics, Shanghai 200433,China
Progress in Intelligence Computation & Applications
Wuhan(CN)
International Symposium on Intelligence Computation & Applications(ISICA'2005); 20050404-06; Wuhan(CN)