首页|超高维生存数据中交互效应的非参数变量筛选法

超高维生存数据中交互效应的非参数变量筛选法

扫码查看
在医学、遗传学、经济学等领域的研究中,线性回归模型常被用来研究变量间的回归关系,以进行分析和预测.而在很多实际问题中,仅仅考虑主效应的影响是远远不够的,变量之间的交互效应也会对因变量产生重要影响,同时考虑主效应和交互效应的交互模型能更全面地刻画变量之间的关系.在高维数据中,变量的个数p比较大,二阶交互项的个数p(p+1)/2更大,此时对交互模型的统计分析存在很大的困难和挑战.如何从众多交互效应中挑选出对感兴趣事件有显著影响的重要交互效应是一个非常重要的问题.目前对此问题的研究主要集中在线性模型框架下的完全数据,本文将研究超高维右删失生存数据中重要交互效应的选取.基于距离相关系数和两步分析法的原理,本文提出了一种不依赖于任何模型假设的交互效应变量筛选方法.此方法可以同时实现重要主效应和重要交互效应的选取,且可以处理p很大的超高维数据.本文通过大量的数值模拟试验评估了该方法在有限样本下的表现,结果显示此方法能有效地处理超高维右删失数据中交互效应的选取问题.最后本文把它应用到弥漫性大b细胞淋巴瘤(DLBCL)数据的实例分析中.
Nonparametric Feature Screening for Interaction Effects in Ultrahigh-dimensional Survival Data
Linear regression models are often used to study the relationship between variables in various fields of scientific research,such as medicine,genetics,economics.However,main effects may not be sufficient to characterize the relationship between the response and predictors in complex situations,the interaction effects between vari-ables will also have an important influence on the response variable in many practical problems.Interaction model that considers both the main effect and the interaction effect can describe the relationship between variables more comprehensively.For high-dimensional data,the number of variables p is relatively large,and the number of second-order interaction terms p(p+1)/2 is much larger,the statistical analysis of the interaction model faces many difficulties and challenges.How to select the important interaction effects that have a significant impact on the event of interest from huge num-ber of interaction effects is a very important problem.The existing research on this problem mainly focuses on the complete data under the framework of the linear model.In this paper,we will consider this problem for ultrahigh-dimensional right-censored survival data.Based on distance correlation and the two-step analysis method,we propose a model-free screening method for interaction effects which does not depend on any model assumptions.This method can select the important main effects and important interaction effects at the same time,and can handle ultrahigh-dimensional data with large p.Extensive simulation studies are carried out to evaluate the finite sample performance of the proposed procedure,and the results show that this method can effectively select the important interaction effects for ultrahigh-dimensional right-censored survival data.As an illustration,we apply the proposed method to analyze the diffuse large-B-cell lymphoma(DLBCL)data.

interaction effectultrahigh-dimensional survival datadistance correla-tiontwo-stage methodfeature screening

张婧、刘妍岩

展开 >

中南财经政法大学统计与数学学院 武汉 430073

武汉大学数学与统计学院 武汉 430072

交互效应 超高维生存数据 距离相关系数 两步分析法 变量筛选

国家自然科学基金国家自然科学基金国家自然科学基金湖北省自然科学基金中央高校基本科研业务费专项中南财经政法大学项目

1197136211901581123712742021CFB5022722024BY024

2024

数学学报
中国科学院数学与系统科学研究院数学研究所

数学学报

CSTPCD北大核心
影响因子:0.261
ISSN:0583-1431
年,卷(期):2024.67(3)
  • 29