首页|高维线性回归模型稳健变量选择方法综述

高维线性回归模型稳健变量选择方法综述

扫码查看
随着大数据时代的到来,在经济学、金融学和生物医学等众多研究领域中频繁收集到高维数据.高维数据的特征之一是变量维数p随着样本量n的增加而变大且通常会超过样本量,同时,异常值也容易出现在高维数据中.因此,如何克服异常值给高维统计推断带来的影响,从而得到更精确的模型,是目前统计学研究的热点问题之一.本文是对高维线性模型下的稳健变量选择方法进行综述.具体地,首先介绍评估稳健性的三个指标:影响函数、崩溃点和最大偏差.其次着重介绍了稳健变量选择方法,包括响应变量含有异常值,响应变量和协变量都含有异常值,高崩溃点且高效的变量选择方法.紧接着介绍相关算法,通过模拟和实例比较不同变量选择方法.最后,简要探讨了高维稳健有效变量选择方法存在的问题及未来的可能发展方向.
Overview of Robust Variable Selection Methods for High-Dimensional Linear Regression Model
With the advance of the era of big data,high-dimensional data are frequently collected in many research fields such as economics,finance,and biomedicine.One of the characteristics of high-dimensional data is that the variable dimension p increases with the increase of the sample size n and usually exceeds the sample size.At the same time,outliers are also prone to appear in high-dimensional data.Therefore,how to overcome the influence of outliers on high-dimensional statistical inference,so as to obtain a more accurate model,is one of the hot issues in current statistical research.This paper is an overview of robust variable selection methods under high-dimensional linear models.Specifically,first of all,we introduce three indicators to evaluate robustness:influence function,breakdown point and maximum deviation.Secondly,it focuses on the selection methods of robust variables,including response variables with outliers,response variables and covariates with outliers,high breakdown point and efficient variable selection methods.Then,the related algorithms are introduced,and different variable selection methods are compared through simulation and examples.Finally,the problems of high-dimensional robust effective variable selection methods and the possible development direction in the future are briefly discussed.

high-dimensional linear regression modelrobustvariable selectionefficient

邹航、姜云卢

展开 >

暨南大学经济学院,广州,510632

高维线性回归模型 稳健性 变量选择 有效性

国家自然科学基金项目广东省自然科学基金项目中央高校基本科研业务费专项资金项目

121712032022A151501004523JNQMX21

2024

应用概率统计
中国数学会概率统计学会

应用概率统计

CSTPCD北大核心
影响因子:0.263
ISSN:1001-4268
年,卷(期):2024.40(1)
  • 92