首页|Optimal decorrelated score subsampling for generalized linear models with massive data

Optimal decorrelated score subsampling for generalized linear models with massive data

扫码查看
In this paper,we consider the unified optimal subsampling estimation and inference on the low-dimensional parameter of main interest in the presence of the nuisance parameter for low/high-dimensional generalized linear models(GLMs)with massive data.We first present a general subsampling decorrelated score function to reduce the influence of the less accurate nuisance parameter estimation with the slow convergence rate.The consistency and asymptotic normality of the resultant subsample estimator from a general decorrelated score subsampling algorithm are established,and two optimal subsampling probabilities are derived under the A-and L-optimality criteria to downsize the data volume and reduce the computational burden.The proposed optimal subsampling probabilities provably improve the asymptotic efficiency of the subsampling schemes in the low-dimensional GLMs and perform better than the uniform subsampling scheme in the high-dimensional GLMs.A two-step algorithm is further proposed to implement,and the asymptotic properties of the corresponding estimators are also given.Simulations show satisfactory performance of the proposed estimators,and two applications to census income and Fashion-MNIST datasets also demonstrate its practical applicability.

A-optimalitydecorrelated score subsamplinghigh-dimensional inferenceL-optimalitymassive data

Junzhuo Gao、Lei Wang、Heng Lian

展开 >

School of Statistics and Data Science & LPMC,Nankai University,Tianjin 300071,China

Department of Mathematics,City University of Hong Kong,Hong Kong,China

Fundamental Research Funds for the Central UniversitiesNational Natural Science Foundation of ChinaKey Laboratory for Medical Data Analysis and Statistical Research of Tianjin

12271272

2024

中国科学:数学(英文版)
中国科学院

中国科学:数学(英文版)

CSTPCD
影响因子:0.36
ISSN:1674-7283
年,卷(期):2024.67(2)
  • 38