首页|面向高维小样本群组数据变量选择方法的比较与应用

面向高维小样本群组数据变量选择方法的比较与应用

扫码查看
高维小样本群组数据变量选择是统计学领域面临的主要问题之一。随着基因组信息学的快速发展,高维小样本数据随处可见,这给统计建模带来了极具挑战性的任务。在高维小样本数据中,有些数据集是呈现群组结构,如果使用单变量选择方法,就会忽略分组信息,从而可能导致变量选择效果大大降低。基于此,主要介绍几种处理高维数据和群组数据集的变量选择方法,并对此进行数值模拟和实证分析。结果表明,在高维小样本群组数据集背景下,当变量维度低于50维时,采用grLasso方法,变量的选择和模型的拟合优度会更优;当变量维度高于50维时,采用grMCP、grSubset+grLasso和grSubset方法,变量的选择和模型的拟合优度会更优。
Comparison and application of group variable selection methods for high-dimensional small sample data
Variable selection of high-dimensional small sample group data is one of the main problems in statistics.With the rapid development of genomic informatics,high-dimensional small sample data can be seen everywhere,which brings challenging tasks to statistical modeling.In high-dimensional small sample data,some data sets present a group structure.If the univariate selection method is used,the grouping information will be ignored,which may lead to a significant reduction in the effect of variable selection.Based on this,this paper mainly introduces several variable selection methods for processing high-dimensional data and group data sets,and conducts numerical simulation and empirical analysis.The results show that in the context of high-dimensional small sample group data sets,when the variable dimension is less than 50 dimensions,the grLasso method will be better for variable selection and model goodness of fit;when the variable dimension is higher than 50 dimensions,the grMCP,grSubset+grLasso and grSubset methods will be better for variable selection and model goodness of fit.

high-dimensional small samplegroup structurevariable selection

李东升、邱宇婷

展开 >

黔南民族师范学院数学与统计学院,贵州都匀 558000

湖南师范大学附属湘才学校,贵州都匀 558000

高维小样本 群组结构 变量选择

贵州省教育厅青年人才成长项目黔南州哲学社会科学理论创新课题

黔教技[2022]380号Qnsk-2022-021

2024

商丘师范学院学报
商丘师范学院

商丘师范学院学报

CHSSCD
影响因子:0.211
ISSN:1672-3600
年,卷(期):2024.40(6)