High-Efficient Parameter-Pruning Algorithm of Decision Tree for Large Dataset
Decision tree(DT)have a good effect on data classification but easily develop overfitting.The solution to this problem is to prune the DT.However,the pruning algorithm has shortcomings;for example,prepruning is prone to underfitting,the postpruning time is extended,and Web-search pruning is only suitable for small datasets.This study proposes an efficient parameter-pruning algorithm for the DT to solve the above problems.Based on the network security situation awareness model,the architecture of the pruned decision-tree situation awareness system is established,and the data flow of the network is analyzed.During the process of generating the DT,enumeration and binary search algorithms are used to determine the maximum depth of the DT,and a depth-first search algorithm is used to determine the minimum number of split nodes and the maximum number of features.Finally,the three optimal parameters are combined to complete the pruning from top to bottom.The experimental results show that this algorithm has a low risk of overfitting in large datasets.The accuracy of the training and testing sets exceed 95%.Compared to the Pessimistic Error-Pruning(PEP)algorithm that exhibits the best performance in post-pruning algorithms,the pruning algorithm is almost 20 times faster.