Online Estimation for Partially Linear Model in Data Streams
The partially linear model,as an important type of semiparametric regression models,is widely used across various fields due to its flexible adaptability in the analysis of complex data structures.However,in the era of big data,the research and application of this model are faced with multiple challenges,with the most critical ones being computing speed and data storage.This study considers the scenario of data streams continuously observed in the form of data blocks and proposes an online estimation method for the parameters of the linear part and the unknown function of the nonlinear part in the partially linear model.This method enables real-time estimation using only the current data block and previously computed summary statistics.To verify the effectiveness,the unit data block size and the total sample size of the data streams are changed respectively in numerical simulations,so that the bias,standard error and mean squared error between the online estimation method and the traditional one can be compared.The experiments demonstrate that,compared to the traditional method,the proposed approach offers the advantages of rapid computation and unnecessary review of historical data,while being close to the traditional method in terms of mean squared error.Finally,based on the data from the China general social survey(CGSS),this study applies the online estimation method to analyze the factors influencing the quality of life of the working-age population in China.The results indicate that full-time work within the range of 30 to 60 hours per week positively contributes to improving the quality of life,providing valuable references for relevant policy formulation.
online estimationpartially linear modelkernel regressionbig datadata compression