首页|Theoretical comparison between the Gini Index and Information Gain criteria

Theoretical comparison between the Gini Index and Information Gain criteria

扫码查看
Knowledge Discovery in Databases (KDD) is an active and important research area with the promise for a high payoff in many business and scientific applications. One of the main tasks in KDD is classification. A particular efficient method for classification is decision tree induction. The selection of the attribute used at each node of the tree to split the data (split criterion) is crucial in order to correctly classify objects. Different split criteria were proposed in the literature (Information Gain, Gini Index, etc.). It is not obvious which of them will produce the best decision tree for a given data set. A large amount of empirical tests were conducted in order to answer this question. No conclusive results were found. In this paper we introduce a formal methodology, which allows us to compare multiple split criteria. This permits us to present fundamental insights into the decision process. Furthermore, we are able to present a formal description of how to select between split criteria for a given data set. As an illustration we apply the methodology to two widely used split criteria: Gini Index and Information Gain.

decision treesclassificationgini indexinformation gaintheoretical comparison

Laura Elena Raileanu、Kilian Stoffel

展开 >

University of Neuchatel, Computer Science Department, Pierre-a-Mazel 7, CH-2000 Neuchatel, Switzerland

2004

Annals of mathematics and artificial intelligence

Annals of mathematics and artificial intelligence

ISTP
ISSN:1012-2443
年,卷(期):2004.41(1)