Nonparametric Feature Screening for Interaction Effects in Ultrahigh-dimensional Survival Data
Linear regression models are often used to study the relationship between variables in various fields of scientific research,such as medicine,genetics,economics.However,main effects may not be sufficient to characterize the relationship between the response and predictors in complex situations,the interaction effects between vari-ables will also have an important influence on the response variable in many practical problems.Interaction model that considers both the main effect and the interaction effect can describe the relationship between variables more comprehensively.For high-dimensional data,the number of variables p is relatively large,and the number of second-order interaction terms p(p+1)/2 is much larger,the statistical analysis of the interaction model faces many difficulties and challenges.How to select the important interaction effects that have a significant impact on the event of interest from huge num-ber of interaction effects is a very important problem.The existing research on this problem mainly focuses on the complete data under the framework of the linear model.In this paper,we will consider this problem for ultrahigh-dimensional right-censored survival data.Based on distance correlation and the two-step analysis method,we propose a model-free screening method for interaction effects which does not depend on any model assumptions.This method can select the important main effects and important interaction effects at the same time,and can handle ultrahigh-dimensional data with large p.Extensive simulation studies are carried out to evaluate the finite sample performance of the proposed procedure,and the results show that this method can effectively select the important interaction effects for ultrahigh-dimensional right-censored survival data.As an illustration,we apply the proposed method to analyze the diffuse large-B-cell lymphoma(DLBCL)data.