首页|随机森林的变量捕获方法在高维数据变量筛选中的应用

随机森林的变量捕获方法在高维数据变量筛选中的应用

扫码查看
目的 探讨随机森林(RF)的变量捕获方法在高维数据变量筛选中的应用.方法 通过模拟实验和实际数据分析,对两种变量捕获(vh.md,vh.vimp)和逐步剔除方法(varSelRF)进行比较,并通过选人变量的数目、模型预测错误率(PE)和受试者工作特征曲线下面积(AUC)对其进行评价.结果 模拟实验表明,在变量具有联合作用、交互作用和弱独立作用情况下,变量捕获方法均明显优于varSelRF方法和全变量VIMP排序方法;实际数据分析结果表明,变量捕获方法筛选变量结果稳定,并能够保证良好的预测效果.结论 变量捕获方法适用于高维数据的变量筛选,具有实用价值.
The Application of a Random Forest-based Variable Hunting Method to Variable Selection in High-dimensional Data
Objective This project explored the application of a random forest-based variable hunting approach to variable selection in high-dimensional data.Methods Two variable hunting methods (vh.md,vh.vimp) were compared with backwards variable elimination using random forest (varSelRF) by the analysis of simulation data and real metabonomics data,and then variable numbers,predicted error rate (PE) and the area under the receiver operating characteristic curve (AUC) were used to evaluate these approaches.Results Simulation experiments suggested that variable hunting method was more effective than varSelRF and sorted VIMP method,in the case of combined effects,interactions and weak independent effects.Analysis results of metabonomics data confirmed that the results of variable selection were stable and had favorable predictive effects with the variable hunting method.Conclusion The variable hunting approach was applicable to variable selection in high-dimensional data and possessed practical value.

Random forestVariable selectionVariable hunting

宋欠欠、李轶群、侯艳、李康

展开 >

哈尔滨医科大学卫生统计学教研室,150081

哈尔滨医科大学生物信息教研室

随机森林 变量筛选 变量捕获

国家自然科学基金高等学校博士学科专项基金

8117276720122307110004

2015

中国卫生统计
中国卫生信息学会 中国医科大学

中国卫生统计

CSTPCDCSCD北大核心
影响因子:1.172
ISSN:1002-3674
年,卷(期):2015.32(1)
  • 14
  • 2