Gini index and decision tree method with mitigating random consistency
The decision tree model has strong interpretability and is the basis of machine learning methods such as random forest and deep forest.Selecting the segmentation attribute and segmentation value of nodes is the core problem of the decision tree method,which has an impact on the generalization ability,depth,balance degree,and other important performance aspects of the tree.Most of the traditional node selection attribute criteria are defined based on the sum of concave functions,which makes the decision tree algorithm have the problem of multivalue bias;that is,it tends to select the attribute with many values as the node segmentation attribute.In the classification task,the performance evaluation method from the perspective of random consistency was verified to have a low classification bias.The evaluation criterion that alleviates random consistency can reduce classification bias and cluster number bias.In this paper,the random consistency of the Gini index is alleviated based on the standard framework to offset its multivalue bias.It is verified by artificial data sets that the standard Gini index can alleviate the multivalue bias problem of the Gini index and select the attributes with decision information.Experimental results on twelve benchmark datasets and two image data sets show that the decision tree based on the pure Gini index has higher generalization performance than the existing decision tree algorithms to mitigate multivalue bias.
Gini indexbias to multi-valuedecision treerandom consistency