Comparison and application of group variable selection methods for high-dimensional small sample data
Variable selection of high-dimensional small sample group data is one of the main problems in statistics.With the rapid development of genomic informatics,high-dimensional small sample data can be seen everywhere,which brings challenging tasks to statistical modeling.In high-dimensional small sample data,some data sets present a group structure.If the univariate selection method is used,the grouping information will be ignored,which may lead to a significant reduction in the effect of variable selection.Based on this,this paper mainly introduces several variable selection methods for processing high-dimensional data and group data sets,and conducts numerical simulation and empirical analysis.The results show that in the context of high-dimensional small sample group data sets,when the variable dimension is less than 50 dimensions,the grLasso method will be better for variable selection and model goodness of fit;when the variable dimension is higher than 50 dimensions,the grMCP,grSubset+grLasso and grSubset methods will be better for variable selection and model goodness of fit.
high-dimensional small samplegroup structurevariable selection