The Application of a Random Forest-based Variable Hunting Method to Variable Selection in High-dimensional Data
Objective This project explored the application of a random forest-based variable hunting approach to variable selection in high-dimensional data.Methods Two variable hunting methods (vh.md,vh.vimp) were compared with backwards variable elimination using random forest (varSelRF) by the analysis of simulation data and real metabonomics data,and then variable numbers,predicted error rate (PE) and the area under the receiver operating characteristic curve (AUC) were used to evaluate these approaches.Results Simulation experiments suggested that variable hunting method was more effective than varSelRF and sorted VIMP method,in the case of combined effects,interactions and weak independent effects.Analysis results of metabonomics data confirmed that the results of variable selection were stable and had favorable predictive effects with the variable hunting method.Conclusion The variable hunting approach was applicable to variable selection in high-dimensional data and possessed practical value.