数值计算与计算机应用2024,Vol.45Issue(2) :154-173.DOI:10.12288/szjs.s2023-0924

正则化回归模型的最优调节参数选择

THE OPTIMAL TUNING PARAMETER SELECTION FOR REGULARIZED REGRESSION MODELS

尚盼 孔令臣
数值计算与计算机应用2024,Vol.45Issue(2) :154-173.DOI:10.12288/szjs.s2023-0924

正则化回归模型的最优调节参数选择

THE OPTIMAL TUNING PARAMETER SELECTION FOR REGULARIZED REGRESSION MODELS

尚盼 1孔令臣1
扫码查看

作者信息

  • 1. 北京交通大学数学与统计学院,北京 100044
  • 折叠

摘要

现代科学技术的发展使得各个领域中产生了大量的高维数据,即样本特征量大于或远远大于样本数量的数据.为了处理高维数据,近年来有大量关于正则化回归模型的研究,即通过引入调节参数将损失函数和正则项联合成一个目标函数,比如著名的LASSO模型及相关模型.众所周知,对于正则化回归模型中最优调节参数的选择至关重要.理论上:该参数刻画了模型解的特征(如稀疏性,低秩性等),从而决定了模型对数据的拟合效果;计算上:不同调节参数下模型的计算代价和计算效果不一样.除了几类特殊的不需要进行最优调节参数选择的模型外,目前最优调节参数选择的方法主要包含三类:交叉验证,信息准则及双层规划.交叉验证及信息准则需要比较模型在不同调节参数下的解,因此这两类最优调节参数选择方法需要多次求解模型.除此之外,如何更为合理地设置备选调节参数也需要进一步考虑.为了降低交叉验证和信息准则进行最优调节参数选择的计算成本,统计、优化及机器学习三个方向的研究者们建立了不同的筛选规则,即在不同调节参数下删除数据中不起作用的特征,从而加速模型解的计算过程以达到加速最优调节参数选择过程的目的.与交叉验证和信息准则不同,双层规划是将最优调节参数选择问题刻画为一个双层规划模型,通过求解模型来直接得到最优调节参数的选择结果.本文从最优调节参数选择的方法和加速两个方面回顾现有结果,并在此基础上提出未来的研究方向.

Abstract

High-dimensional data set arises in many fields,which means the feature size is greater than or far greater than the sample size.To deal with these,there have been a lot of researches on regularized models,whose formulations are an objective function composed by the loss function and regularization term.These two terms are combined by the tuning parameter.It is well known that tuning parameter selection is very important.Theoretically,this parameter characterizes properties of the model solution and determines the model effect.Practically,the calculation cost and computational effect are different under different tuning parameters.As far as we known,there are three main methods to select the optimal tuning parameter,which are cross validation,information criterion and bilevel programming.For cross validation and information criterion,they all require big computational costs,causing by that fact that they need to calculate solutions under different tuning parameters.Besides that,how to appropriately choose the sequence of possible tuning parameters is an essential problem.For the purpose of reducing the computational cost of cross validation and information criterion,screening rules are proposed to eliminate inactive features in data sets and speed up the tuning parameter selection procedure.Comparing to these popular ways,transforming the tuning parameter selection problem to a bilevel programming is a more direct way.But this usually lead to a nonconvex optimization problem and still need to be explored.This paper will review existing works from tuning parameter selection methods and acceleration perspectives,respectively.Based on these,we propose the future works.

关键词

正则化回归模型/调节参数选择/筛选规则

Key words

Regularized regression model/Tuning parameter selection/Screening rule

引用本文复制引用

出版年

2024
数值计算与计算机应用
中国科学院数学与系统科学研究院

数值计算与计算机应用

影响因子:0.188
ISSN:1000-3266
段落导航相关论文